Spark执行内存监控 [英] Spark execution memory monitoring

查看:522
本文介绍了Spark执行内存监控的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想要的是能够监视Spark 执行内存,而不是SparkUI中可用的存储内存.我的意思是,执行内存没有执行者内存.

What I want is to be able to monitor Spark execution memory as opposed to storage memory available in SparkUI. I mean, execution memory NOT executor memory.

通过执行内存,我的意思是:

By execution memory I mean:

此区域用于在执行随机,连接,排序和聚合时缓冲中间数据.该区域的大小是通过spark.shuffle.memoryFraction(default0.2)配置的. 根据: Spark 1.6中的统一内存管理

This region is used for buffering intermediate data when performing shuffles, joins, sorts and aggregations. The size of this region is configured through spark.shuffle.memoryFraction (default0.2). According to: Unified Memory Management in Spark 1.6

在大量搜索答案之后,除了未解决的StackOverflow问题之外,我什么都没有找到,这些答案仅与存储内存有关,或者与使用Ganglia 使用Cloudera控制台等等

After intense search for answers I found nothing but unanswered StackOverflow questions, answers that relate only to storage memory or ones with vague answers of the type use Ganglia, use Cloudera console etc...

似乎对堆栈溢出有此信息的需求,但没有一个令人满意的答案.这是搜索监视火花存储器

There seems to be a demand for this information on Stack Overflow, and yet not a single satisfactory answer is available. Here are some top posts of StackOverflow when searching monitoring spark memory

监控Spark执行和存储内存利用率

Monitor Spark execution and storage memory utilisation

监视Spark作业的内存使用情况

Monitoring the Memory Usage of Spark Jobs

SPARK:如何监视内存消耗Spark集群?

SPARK: How to monitor the memory consumption on Spark cluster?

Spark-监视实际使用的执行者内存

Spark - monitor actual used executor memory

>如何监视内存和CPU火花应用程序的用法?

How can I monitor memory and CPU usage by spark application?

如何获取内存和CPU使用率通过Spark应用程序?

How to get memory and cpu usage by a Spark application?

问题

Spark版本> 2.0

Spark version > 2.0

  1. 是否可以监视Spark作业的执行内存?通过监视,我的意思是至少看到已使用/可用,就像在SparkUI的执行程序"选项卡中为每个执行程序存储的内存一样.是或否?

  1. Is it possible to monitor Execution memory of Spark job? By monitoring I mean at minimum see used/available just like for storage memory per executor in Executor tab of SparkUI. Yes or No?

我可以用SparkListeners(@JacekLaskowski吗?)历史服务器呢?还是唯一的办法就是通过外部工具? Graphana,Ganglia,还有其他人吗?如果是外部工具,您能否指向教程或提供一些更详细的指南?

Could I do it with SparkListeners (@JacekLaskowski ?) How about history-server? Or the only way is through the external tools? Graphana, Ganglia, others? If external tools, could you please point to a tutorial or provide some more detailed guidelines?

我看到了这个 SPARK-9103跟踪Spark的内存使用情况似乎尚无法监视执行内存.这似乎也很相关 SPARK-23206其他内存调整指标.

I saw this SPARK-9103 Tracking spark's memory usage seems like it is not yet possible to monitor execution memory. Also this seems relevant SPARK-23206 Additional Memory Tuning Metrics.

Peak Execution memory是对任务中执行内存使用/占用的可靠估计吗?例如,如果某个阶段UI表示某个任务在峰值使用1 Gb任务,而每个执行者我有5 cpu,这是否意味着我需要每个执行者至少5 Gb执行内存来完成一个阶段?

Does Peak Execution memory is reliable estimate of usage/occupation of execution memory in a task? If for example it a Stage UI says that a task uses 1 Gb at peak, and I have 5 cpu per executor, does it mean I need at least 5 Gb execution memory available on each executor to finish a stage?

还有其他一些代理可以用来了解执行内存吗?

Are there some other proxies we could use to get a glimpse of execution memory?

是否有办法知道执行内存何时开始消耗到存储内存中?当我的缓存表从SparkUI的存储"选项卡中消失或仅保留一部分时,是否意味着它已被执行内存驱逐?

Is there a way to know when the execution memory starts to eat into storage memory? When my cached table disappears from Storage tab in SparkUI or only part of it remains, does it mean it was evicted by the execution memory?

推荐答案

回答我自己的问题以供将来参考:

Answering my own question for future reference:

我们正在使用Mesos作为集群管理器.在Mesos UI中,我找到了一个页面,其中列出了给定工作程序上的所有执行程序,并且可以在其中找到该执行程序的内存使用情况.似乎是总的内存使用量存储+执行.我可以清楚地看到,当内存填满时,执行器就死了.

We are using Mesos as cluster manager. In the Mesos UI I found a page that lists all executors on a given worker and there one can find a Memory usage of the executor. It seems to be a total memory usage storage+execution. I can clearly see that when the memory fills up the executor dies.

要访问:

  • 转到代理"标签,其中列出了所有集群工作人员
  • 选择工作人员
  • 选择框架-带有脚本名称的框架
  • 内部,您将看到在此特定工人上运行的工作的执行者列表.
  • 有关内存使用情况,请参阅:Mem(已使用/已分配)

可以为驱动程序执行类似操作.对于框架,请选择一个名称为Spark Cluster

The similar can be done for driver. For a framework you choose the one with a name Spark Cluster

如果您想知道如何以编程方式提取此数字,请参阅我对以下问题的回答:

If you want to know how to extract this number programatically see my response to this question: How to get Mesos Agents Framework Executor Memory

这篇关于Spark执行内存监控的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆