如何从Dask-Yarn作业中捕获工人的日志? [英] How to capture logs from workers from a Dask-Yarn job?
问题描述
我尝试在〜/ .config / dask / distributed.yaml
和〜/ .config / dask / yarn.yaml中使用以下内容
,
logging-file-config: "/path/to/config.ini"
或
logging:
version: 1
disable_existing_loggers: false
root:
level: INFO
handlers: [consoleHandler]
handlers:
consoleHandler:
class: logging.StreamHandler
level: INFO
formatter: sample_formatter
stream: ext://sys.stderr
formatters:
sample_formatter:
format: '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
,然后在我的函数中得到工作人员的评估:
and then in my function that gets evaluated at the worker:
import logging
from distributed.worker import logger
import dask
from dask.distributed import Client
from dask_yarn import YarnCluster
log = logging.getLogger(__name__)
@dask.delayed
def worker_func(args):
logger.info("This will show up in the worker logs")
log.info("This does not show up in worker logs")
return
if __name__ == "__main__":
dag_1 = {'worker_func': (worker_func, arg_1)}
tasks = dask.get(dag_1, 'load-1')
log.info("This also shows up in logs, and custom formatted)
cluster = YarnCluster()
client = Client(cluster)
dask.compute(tasks)
当我尝试使用以下方式查看纱线记录时:
When I try to view the yarn logs using:
yarn logs -applicationId {application_id}
我没有看到 log.info
里面的日志 worker_func
,但我确实从 distributed.worker.logger
以及控制台上该功能之外看到了日志。我还尝试使用 client.get_worker_logs(),但是返回了一个空字典。
I do not see the log from log.info
inside worker_func
, but I do see the logs from distributed.worker.logger
and from outside that function on the console. I also tried using client.get_worker_logs(), but that returned an empty dictionary. Is there a way to see customized logs from inside the function that gets evaluated at a worker?
推荐答案
里面有很多事情要做吗?这个问题,所以我将回答如何为dask-yarn工作者配置日志记录,并希望通过回答该问题使其他一切变得清晰。
There's a lot going on in this question, so I'm going to answer "How do I configure logging for dask-yarn workers" and hope everything else becomes clear by answering that.
Dask的配置系统在您从中启动dask群集的计算机上(通常是边缘节点)本地加载了。此配置不会自动 分发给工作人员,您需要自己负责。您在这里有一些选择:
Dask's configuration system is loaded locally on the machine you start a dask cluster from (usually the edge node). This configuration is not distributed to the workers automatically, you're responsible for doing that yourself. You have a few options here:
- 已将admin / IT配置放入
/ etc / dask /
在每个节点上,这将影响所有用户。 - 打包环境中的捆绑包配置。 Dask将从
{prefix} / etc / dask /
加载配置,其中prefix
是sys .prefix
。
- Have admin/IT put configuration in
/etc/dask/
on every node, which will affect all users. - Bundle configuration with your packaged environment. Dask will load configuration from
{prefix}/etc/dask/
, whereprefix
issys.prefix
.
例如,如果您在 /下有一个conda环境路径/到达/环境
,您将执行以下操作以捆绑配置
For example, if you have a conda environment at /path/to/environment
, you'd do the following to bundle the configuration
# Create the configuration directory in the environment
mkdir -p /path/to/environment/etc/dask/
# Add your configuration to this directory
mv config.yaml /path/to/environment/etc/dask/config.yaml
# Package the environment
conda pack -p /path/to/environment -o environment.tar.gz
在 config.yaml
中设置的任何配置值现在将在所有工作节点上可用。设置一些日志记录配置的示例配置文件为:
Any configuration values set in config.yaml
will now be available on all the worker nodes. An example configuration file setting some logging configuration would be:
logging:
version: 1
root:
level: INFO
handlers: [consoleHandler]
handlers:
consoleHandler:
class: logging.StreamHandler
level: INFO
formatter: sample_formatter
stream: ext://sys.stderr
formatters:
sample_formatter:
format: '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
日志来自可以使用YARN cli检索完成的dask-yarn应用程序。
Logs from completed dask-yarn applications can be retrieved using the YARN cli at
yarn logs -applicationId <application-id>
运行 dask-yarn应用程序的日志可以使用 client.get_worker_logs()
。请注意,这些日志将仅包含写入 distributed.worker
记录器的日志。您不能写自己的记录器,而将它们显示在 client.get_worker_logs()
的输出中。要写入此记录器,请通过
Logs for running dask-yarn applications can be retrieved using client.get_worker_logs()
. Note that these logs will only contain logs written to the distributed.worker
logger. You cannot write to your own logger and have them appear in the output of client.get_worker_logs()
. To write to this logger, get it via
import logging
logger = logging.getLogger("distributed.worker")
logger.info("Writing with the worker logger")
任何配置适当的记录器登录到 stdout
或 stderr
的日志将显示在通过yarn CLI访问的日志中,但是只有 distributed.worker
记录器的输出也可用于 get_worker_logs()
。
Any logger appropriately configured to log to stdout
or stderr
will appear in the logs accessed via the yarn CLI, but only the distributed.worker
logger output will also be available to get_worker_logs()
.
旁注
我尝试在〜/ .config / dask / distributed.yaml和〜/ .config / dask / yarn.yaml
I have tried using the following in ~/.config/dask/distributed.yaml and ~/.config/dask/yarn.yaml
配置文件的名称没关系,dask会将所有 yaml
文件加载到所有配置目录中并合并它们的内容。有关更多信息,请阅读配置文档
The name of the config files doesn't matter, dask loads all yaml
files in all config directories and merges their contents. For more information please read the configuration docs
这篇关于如何从Dask-Yarn作业中捕获工人的日志?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!