如何从Dask-Yarn作业中捕获工人的日志? [英] How to capture logs from workers from a Dask-Yarn job?

查看:121
本文介绍了如何从Dask-Yarn作业中捕获工人的日志?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试在〜/ .config / dask / distributed.yaml 〜/ .config / dask / yarn.yaml中使用以下内容

logging-file-config: "/path/to/config.ini"

logging:
  version: 1
  disable_existing_loggers: false

  root:
    level: INFO
    handlers: [consoleHandler]

  handlers:
    consoleHandler:
      class: logging.StreamHandler
      level: INFO
      formatter: sample_formatter
      stream: ext://sys.stderr

  formatters:
    sample_formatter:
      format: '%(asctime)s - %(name)s - %(levelname)s - %(message)s'

,然后在我的函数中得到工作人员的评估:

and then in my function that gets evaluated at the worker:

import logging
from distributed.worker import logger
import dask
from dask.distributed import Client
from dask_yarn import YarnCluster

log = logging.getLogger(__name__)

@dask.delayed
def worker_func(args):
    logger.info("This will show up in the worker logs")
    log.info("This does not show up in worker logs")
    return

if __name__ == "__main__":
    dag_1 = {'worker_func': (worker_func, arg_1)}
    tasks = dask.get(dag_1, 'load-1')

    log.info("This also shows up in logs, and custom formatted)
    cluster = YarnCluster()
    client = Client(cluster)
    dask.compute(tasks)

当我尝试使用以下方式查看纱线记录时:

When I try to view the yarn logs using:

yarn logs -applicationId {application_id}

我没有看到 log.info 里面的日志 worker_func ,但我确实从 distributed.worker.logger 以及控制台上该功能之外看到了日志。我还尝试使用 client.get_worker_logs(),但是返回了一个空字典。

I do not see the log from log.info inside worker_func, but I do see the logs from distributed.worker.logger and from outside that function on the console. I also tried using client.get_worker_logs(), but that returned an empty dictionary. Is there a way to see customized logs from inside the function that gets evaluated at a worker?

推荐答案

里面有很多事情要做吗?这个问题,所以我将回答如何为dask-yarn工作者配置日志记录,并希望通过回答该问题使其他一切变得清晰。

There's a lot going on in this question, so I'm going to answer "How do I configure logging for dask-yarn workers" and hope everything else becomes clear by answering that.

Dask的配置系统在您从中启动dask群集的计算机上(通常是边缘节点)本地加载了。此配置不会自动 分发给工作人员,您需要自己负责。您在这里有一些选择:

Dask's configuration system is loaded locally on the machine you start a dask cluster from (usually the edge node). This configuration is not distributed to the workers automatically, you're responsible for doing that yourself. You have a few options here:


  • 已将admin / IT配置放入 / etc / dask / 在每个节点上,这将影响所有用户。

  • 打包环境中的捆绑包配置。 Dask将从 {prefix} / etc / dask / 加载配置,其中 prefix sys .prefix

  • Have admin/IT put configuration in /etc/dask/ on every node, which will affect all users.
  • Bundle configuration with your packaged environment. Dask will load configuration from {prefix}/etc/dask/, where prefix is sys.prefix.

例如,如果您在 /下有一个conda环境路径/到达/环境,您将执行以下操作以捆绑配置

For example, if you have a conda environment at /path/to/environment, you'd do the following to bundle the configuration

# Create the configuration directory in the environment
mkdir -p /path/to/environment/etc/dask/
# Add your configuration to this directory
mv config.yaml /path/to/environment/etc/dask/config.yaml
# Package the environment
conda pack -p /path/to/environment -o environment.tar.gz

config.yaml 中设置的任何配置值现在将在所有工作节点上可用。设置一些日志记录配置的示例配置文件为:

Any configuration values set in config.yaml will now be available on all the worker nodes. An example configuration file setting some logging configuration would be:

logging:
  version: 1

  root:
    level: INFO
    handlers: [consoleHandler]

  handlers:
    consoleHandler:
      class: logging.StreamHandler
      level: INFO
      formatter: sample_formatter
      stream: ext://sys.stderr

  formatters:
    sample_formatter:
      format: '%(asctime)s - %(name)s - %(levelname)s - %(message)s'

日志来自可以使用YARN cli检索完成的dask-yarn应用程序。

Logs from completed dask-yarn applications can be retrieved using the YARN cli at

yarn logs -applicationId <application-id>

运行 dask-yarn应用程序的日志可以使用 client.get_worker_logs() 。请注意,这些日志将仅包含写入 distributed.worker 记录器的日志。您不能写自己的记录器,而将它们显示在 client.get_worker_logs()的输出中。要写入此记录器,请通过

Logs for running dask-yarn applications can be retrieved using client.get_worker_logs(). Note that these logs will only contain logs written to the distributed.worker logger. You cannot write to your own logger and have them appear in the output of client.get_worker_logs(). To write to this logger, get it via

import logging
logger = logging.getLogger("distributed.worker")
logger.info("Writing with the worker logger")

任何配置适当的记录器登录到 stdout stderr 的日志将显示在通过yarn CLI访问的日志中,但是只有 distributed.worker 记录器的输出也可用于 get_worker_logs()

Any logger appropriately configured to log to stdout or stderr will appear in the logs accessed via the yarn CLI, but only the distributed.worker logger output will also be available to get_worker_logs().

旁注


我尝试在〜/ .config / dask / distributed.yaml和〜/ .config / dask / yarn.yaml

I have tried using the following in ~/.config/dask/distributed.yaml and ~/.config/dask/yarn.yaml

配置文件的名称没关系,dask会将所有 yaml 文件加载到所有配置目录中并合并它们的内容。有关更多信息,请阅读配置文档

The name of the config files doesn't matter, dask loads all yaml files in all config directories and merges their contents. For more information please read the configuration docs

这篇关于如何从Dask-Yarn作业中捕获工人的日志?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆