监视Spark作业的内存使用情况 [英] Monitoring the Memory Usage of Spark Jobs

查看:469
本文介绍了监视Spark作业的内存使用情况的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们如何获得用于火花作业的整体内存.我无法获取我们可以引用的确切参数来检索相同的参数.已经提到了Spark UI,但不确定我们可以参考的领域.同样在Ganglia,我们提供以下选项: a)内存缓冲区 b)高速缓存 c)可用内存 d)共享内存 e)免费交换空间

How can we get the overall memory used for a spark job. I am not able to get the exact parameter which we can refer to retrieve the same. Have referred to Spark UI but not sure of the field which we can refer. Also in Ganglia we have the following options: a)Memory Buffer b)Cache Memory c)Free Memory d)Shared Memory e)Free Swap Space

无法获得与已用内存"相关的任何选项.有人对此有想法吗?

Not able to get any option related to Memory Used. Does anyone have some idea regarding this.

推荐答案

如果您保留RDD,则可以通过UI看到它们在内存中的大小.

If you persist your RDDs you can see how big they are in memory via the UI.

很难了解用于中间任务(例如随机播放)的内存量.基本上,Spark将在可用的情况下根据需要使用尽可能多的内存.这意味着,如果您的RDD占用了超过50%的可用资源,则由于可用于执行的资源较少,应用程序可能会变慢.

It's hard to get an idea of how much memory is being used for intermediate tasks (e.g. for shuffles). Basically Spark will use as much memory as it needs given what's available. This means that if your RDDs take up more than 50% of your available resources, your application might slow down because there are fewer resources available for execution.

这篇关于监视Spark作业的内存使用情况的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆