Google Cloud Dataflow上的内存分析 [英] Memory profiling on Google Cloud Dataflow
问题描述
调试数据流作业的内存问题的最佳方法是什么?
What would be the best way to debug memory issues of a dataflow job?
我的工作失败,并出现GC OOM错误,但是当我在本地对其进行配置文件时,我无法重现确切的方案和数据量.
My job was failing with a GC OOM error, but when I profile it locally I cannot reproduce the exact scenarios and data volumes.
我现在正在'n1-highmem-4'机器上运行它,我再也看不到该错误,但是工作非常缓慢,因此显然使用具有更多RAM的机器不是解决方案:)
I'm running it now on 'n1-highmem-4' machines, and I don't see the error anymore, but the job is very slow, so obviously using machine with more RAM is not the solution :)
感谢您的任何建议, G
Thanks for any advice, G
推荐答案
请使用选项--dumpHeapOnOOM
和--saveHeapDumpsToGcsPath
(请参阅
Please use the option --dumpHeapOnOOM
and --saveHeapDumpsToGcsPath
(see docs).
这仅在您的一名工人实际OOM时才有用.此外,如果不是OOMing,但是仍然观察到高内存使用情况,您可以尝试在工作程序上的线程进程上运行jmap -dump PID
,以在运行时获取堆转储.
This will only help if one of your workers actually OOMs. Additionally you can try running jmap -dump PID
on the harness process on the worker to obtain a heap dump at runtime if it's not OOMing but if you observe high memory usage nevertheless.
这篇关于Google Cloud Dataflow上的内存分析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!