spark UI对应用程序的内存使用有什么影响? [英] What is the impact of spark UI on application's memory usage?

查看:163
本文介绍了spark UI对应用程序的内存使用有什么影响?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Spark应用程序(2.4.5),使用Kafka作为源,使用大批处理窗口(5分钟),在我们的应用程序中,我们只真正关心从该特定时间间隔到处理数据的RDD.

I have a Spark application (2.4.5) using Kafka as the source using big batch windows (5 minutes), in our application we only really care about the RDD from that specific interval to process data.

所发生的是,我们的应用程序有时会崩溃,而驱动程序上的OutOfMemory异常(在客户端模式下运行)或执行程序上的GC OutOfMemory异常.经过大量研究,似乎我们没有正确地处理状态,这导致沿袭无限期增长.我们考虑通过使用批处理方法来解决问题,在批处理方法中,我们控制从Kafka获取的偏移量并从中创建RDD(这将截断谱系),或者启用检查点.

What is happening is that our application is crashing from time to time with either OutOfMemory exception on the Driver (running in client mode) or GC OutOfMemory on the executors. After a lot of research, it seemed that we were not handling the states properly which was causing the Lineage to grow indefinitely. We considered fixing the problem either by using a batch approach where we control the offsets grabbed from Kafka and create the RDD's from them (which would truncate the lineage) or by enabling checkpointing.

在调查过程中,有人发现了一个不太相似的问题,该问题可以通过调整一些UI参数来解决(

During the investigations someone found a not really similar issue which was solved by tweaking some UI parameters (Yarn Heap usage growing over time):

  • spark.ui.retainedJobs = 50
  • spark.ui.retainedStages = 50
  • spark.ui.retainedTasks = 500
  • spark.worker.ui.retainedExecutors = 50
  • spark.worker.ui.retainedDrivers = 50
  • spark.sql.ui.retainedExecutions = 50
  • spark.streaming.ui.retainedBatches = 50

由于这些是UI参数,因此除非它们影响应用程序存储要发送到UI的信息的方式,否则它们不会影响应用程序的内存使用.早期测试表明,该应用程序确实可以运行更长的时间,而不会出现OOM问题.

Since these are UI parameters, it doesn't make sense to me that they would affect the application's memory usage unless they affect the way applications store information to send to the UI. Early tests show that the application is indeed running longer without OOM issues.

谁能解释这些参数对应用程序有什么影响?它们真的会影响应用程序的内存使用吗?我是否应该查看其他参数以获取整体图像(我想知道是否需要调整"factor"参数,以便内存分配适合我们的情况)?

Can anyone explain what is the impact these parameters have on Applications? Can they really impact memory usage on applications? Are there any other parameters that I should look into to get the whole picture (I'm wondering if there is a "factor" parameter that needs to be tweaked so memory allocation is appropriate for our case)?

谢谢

推荐答案

经过大量测试,我们的团队设法将问题缩小到此特定参数:

After a lot of testing our team managed to narrow down the problem to this particular paramter:

spark.sql.ui.retainedExecutions

我决定深入研究,因此下载了Spark的代码.我发现有关已解析逻辑计划的信息不仅保留在应用程序的内存中,而且还受此参数控制.

I decided to dig in so I downloaded Spark's code. I found out that information about the Parsed Logical Plan is not only kept in the application's memory but it's also controlled by this parameter.

创建SparkSession会话时,实例化的许多对象之一是SQLAppStatusListener.此类实现两种方法:

When a SparkSession session is created, one of the many objects that are instantiated is the SQLAppStatusListener. This class implements two methods:

onExecutionStart -每次执行时,创建一个新的SparkPlanGraphWrapper,它将保存对已解析逻辑计划的引用,并将其添加到SharedState对象中,在这种情况下,该对象将跟踪该实例的多少个实例.对象已创建.

onExecutionStart - On every execution , creates a new SparkPlanGraphWrapper, which will hold references to the Parsed Logical Plan, and add it to a SharedState object which in this case keeps track of how many instances of the object were created.

cleanupExecution -如果存储的对象数大于spark.sql.ui.retainedExecutions 的值,则从SharedState对象中删除SparkPlanGraphWrapper,默认值为<强度> 1000 .

cleanupExecution - Removes the SparkPlanGraphWrapper from the SharedState object if the number of stored objects is greater than the value of spark.sql.ui.retainedExecutions, which defaults to 1000.

在本例中,逻辑计划占用了4MB的内存,因此以一种简单的方式,我们将不得不分配4GB的内存来容纳保留的执行.

In our case specifically, the logical plan was taking 4MB of memory, so in a simplistic way, we would have to allocate 4GB of memory to accommodate the retained executions.

这篇关于spark UI对应用程序的内存使用有什么影响?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆