Dask Distributed-用于监视内存使用情况的插件 [英] Dask Distributed - Plugin for Monitoring Memory Usage
问题描述
我有一个分布式Dask集群,我可以通过Dask Distributed Client将大量工作发送给该集群.
I have a distributed Dask cluster that I send a bunch of work to via Dask Distributed Client.
在发送大量工作结束时,我很想得到一份报告或告诉我每个工人的最大内存使用量是什么.
At the end of sending a bunch of work, I'd love to get a report or something that tells me what was the peak memory usage of each worker.
是否可以通过现有的诊断工具进行? https://docs.dask.org/en/latest/diagnostics-distributed.html
Is this possible via existing diagnostics tools? https://docs.dask.org/en/latest/diagnostics-distributed.html
谢谢!最好,
推荐答案
专门用于内存,可以使用 client.scheduler_info()
从调度程序(运行时)提取信息(这可以被转储为json).对于峰值内存,必须有一个额外的功能,该功能可以将当前使用情况与以前的使用情况进行比较,并选择最大值.
Specifically for memory, it's possible to extract information from the scheduler (while it's running) using client.scheduler_info()
(this can be dumped as a json). For peak memory there would have to be an extra function that will compare the current usage with the previous usage and pick max.
要获取许多其他有用的信息(而不是峰值内存消耗),请查看内置报告:
For a lot of other useful information, but not the peak memory consumption, there's the built-in report:
from dask.distributed import performance_report
with performance_report(filename="dask-report.html"):
## some dask computation
(文档中的代码: https://docs.dask.org/en/latest/diagnostics-distributed.html )
更新:还有一个专用的dask插件,可以记录每个任务的最小/最大内存使用量: https://github.com/itamarst/dask-memusage
Update: there is also a dedicated plugin for dask to record min/max memory usage per task: https://github.com/itamarst/dask-memusage
更新2:有一篇不错的博客文章,其中包含通过dask跟踪内存使用情况的代码: https://blog.dask.org/2021/03/11/dask_memory_usage
Update 2: there is a nice blog post with code to track memory usage by dask: https://blog.dask.org/2021/03/11/dask_memory_usage
这篇关于Dask Distributed-用于监视内存使用情况的插件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!