如何设置dask分布式工作者的日志记录? [英] How to set up logging on dask distributed workers?
问题描述
在将dask升级到1.15.0版后,我的日志记录停止了工作。
After upgrading of dask distributed to version 1.15.0 my logging stopped working.
我已经使用logging.config.dictConfig初始化python日志记录功能,并且以前这些设置传播到所有工作人员。
I've used logging.config.dictConfig to initialize python logging facilities, and previously these settings propagated to all workers. But after upgrade it doesn't work anymore.
如果我在对每个工作线程进行每次日志调用之前都执行dictConfig,那么它将正常工作,但这不是一个合适的解决方案。
If I do dictConfig right before every log call on every worker it works but it's not a proper solution.
所以问题是它如何在我的计算图开始执行之前初始化登录每个工作人员,并且每个工作人员仅执行一次?
So the question is how it initialize logging on every worker before my computation graph starts executing and do it only once per worker?
更新:
此黑客在一个虚拟示例上起作用,但对我的系统没有影响:
This hack worked on a dummy example but didn't make a difference on my system:
def init_logging():
# logging initializing happens here
...
client = distributed.Client()
client.map(lambda _: init_logging, client.ncores())
更新2:
在查阅了文档后,此问题得以解决:
After digging through documentation this fixed the problem:
client.run(init_logging)
所以现在的问题是:这是解决此问题的正确方法吗?
So the question now is: Is this a proper way to solve this problem?
推荐答案
A现在版本为1.15.0,我们从一个干净的进程派生了工人,因此在调用 Client()
之前对您的过程所做的更改不会影响派生的工人。有关更多信息,请在此处搜索叉服务器
: https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods
As of version 1.15.0 we now fork workers from a clean process, so changes that you make to your process prior to calling Client()
won't affect forked workers. For more information search for forkserver
here: https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods
您使用 Client.run
的解决方案对我来说不错。 Client.run当前(从1.15.0版开始)是在所有当前活动的worker上调用函数的最佳方法。
Your solution of using Client.run
looks good to me. Client.run is currently (as of version 1.15.0) the best way to call a function on all currently active workers.
值得注意的是,这里您是在一台计算机上设置从同一进程派生的客户端。您在上面使用的技巧在分布式环境中不起作用。如果有人问这个问题,询问如何在群集环境中使用Dask进行日志记录,我会添加此注释。
It is worth noting that here you're setting up clients forked from the same process on a single computer. The trick you use above will not work in a distributed setting. In case people come to this question asking about how to handle logging with Dask in a cluster context I'm adding this note.
通常,Dask不会移动日志。相反,通常用于启动Dask的任何机制都可以处理此问题。诸如SGE / SLURM / Torque / PBS之类的作业调度程序都可以执行此操作。像YARN / Mesos / Marathon / Kubernetes这样的云系统都可以做到这一点。 dask-ssh
工具执行此操作。
Generally Dask does not move logs around. Instead, it is common that whatever mechanism you used to launch Dask handles this. Job schedulers like SGE/SLURM/Torque/PBS all do this. Cloud systems like YARN/Mesos/Marathon/Kubernetes all do this. The dask-ssh
tool does this.
这篇关于如何设置dask分布式工作者的日志记录?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!