Dask Distributed-用于监视内存使用情况的插件 [英] Dask Distributed - Plugin for Monitoring Memory Usage

查看:44
本文介绍了Dask Distributed-用于监视内存使用情况的插件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个分布式Dask集群,我可以通过Dask Distributed Client将大量工作发送给该集群.

I have a distributed Dask cluster that I send a bunch of work to via Dask Distributed Client.

在发送大量工作结束时,我很想得到一份报告或告诉我每个工人的最大内存使用量是什么.

At the end of sending a bunch of work, I'd love to get a report or something that tells me what was the peak memory usage of each worker.

是否可以通过现有的诊断工具进行? https://docs.dask.org/en/latest/diagnostics-distributed.html

Is this possible via existing diagnostics tools? https://docs.dask.org/en/latest/diagnostics-distributed.html

谢谢!最好,

推荐答案

专门用于内存,可以使用 client.scheduler_info()从调度程序(运行时)提取信息(这可以被转储为json).对于峰值内存,必须有一个额外的功能,该功能可以将当前使用情况与以前的使用情况进行比较,并选择最大值.

Specifically for memory, it's possible to extract information from the scheduler (while it's running) using client.scheduler_info() (this can be dumped as a json). For peak memory there would have to be an extra function that will compare the current usage with the previous usage and pick max.

要获取许多其他有用的信息(而不是峰值内存消耗),请查看内置报告:

For a lot of other useful information, but not the peak memory consumption, there's the built-in report:

from dask.distributed import performance_report

with performance_report(filename="dask-report.html"):
    ## some dask computation

(文档中的代码: https://docs.dask.org/en/latest/diagnostics-distributed.html )

更新:还有一个专用的dask插件,可以记录每个任务的最小/最大内存使用量: https://github.com/itamarst/dask-memusage

Update: there is also a dedicated plugin for dask to record min/max memory usage per task: https://github.com/itamarst/dask-memusage

更新2:有一篇不错的博客文章,其中包含通过dask跟踪内存使用情况的代码: https://blog.dask.org/2021/03/11/dask_memory_usage

Update 2: there is a nice blog post with code to track memory usage by dask: https://blog.dask.org/2021/03/11/dask_memory_usage

这篇关于Dask Distributed-用于监视内存使用情况的插件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆