spark UI对应用程序的内存使用有什么影响? [英] What is the impact of spark UI on application's memory usage?

查看：163 发布时间：2021/4/8 19:36:41 apache-spark user-interface parameters out-of-memory driver

本文介绍了spark UI对应用程序的内存使用有什么影响?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个Spark应用程序(2.4.5)，使用Kafka作为源，使用大批处理窗口(5分钟)，在我们的应用程序中，我们只真正关心从该特定时间间隔到处理数据的RDD.

I have a Spark application (2.4.5) using Kafka as the source using big batch windows (5 minutes), in our application we only really care about the RDD from that specific interval to process data.

所发生的是，我们的应用程序有时会崩溃，而驱动程序上的OutOfMemory异常(在客户端模式下运行)或执行程序上的GC OutOfMemory异常.经过大量研究，似乎我们没有正确地处理状态，这导致沿袭无限期增长.我们考虑通过使用批处理方法来解决问题，在批处理方法中，我们控制从Kafka获取的偏移量并从中创建RDD(这将截断谱系)，或者启用检查点.

What is happening is that our application is crashing from time to time with either OutOfMemory exception on the Driver (running in client mode) or GC OutOfMemory on the executors. After a lot of research, it seemed that we were not handling the states properly which was causing the Lineage to grow indefinitely. We considered fixing the problem either by using a batch approach where we control the offsets grabbed from Kafka and create the RDD's from them (which would truncate the lineage) or by enabling checkpointing.

在调查过程中，有人发现了一个不太相似的问题，该问题可以通过调整一些UI参数来解决(

During the investigations someone found a not really similar issue which was solved by tweaking some UI parameters (Yarn Heap usage growing over time):

spark.ui.retainedJobs = 50
spark.ui.retainedStages = 50
spark.ui.retainedTasks = 500
spark.worker.ui.retainedExecutors = 50
spark.worker.ui.retainedDrivers = 50
spark.sql.ui.retainedExecutions = 50
spark.streaming.ui.retainedBatches = 50

由于这些是UI参数，因此除非它们影响应用程序存储要发送到UI的信息的方式，否则它们不会影响应用程序的内存使用.早期测试表明，该应用程序确实可以运行更长的时间，而不会出现OOM问题.

Since these are UI parameters, it doesn't make sense to me that they would affect the application's memory usage unless they affect the way applications store information to send to the UI. Early tests show that the application is indeed running longer without OOM issues.

谁能解释这些参数对应用程序有什么影响?它们真的会影响应用程序的内存使用吗?我是否应该查看其他参数以获取整体图像(我想知道是否需要调整"factor"参数，以便内存分配适合我们的情况)?

Can anyone explain what is the impact these parameters have on Applications? Can they really impact memory usage on applications? Are there any other parameters that I should look into to get the whole picture (I'm wondering if there is a "factor" parameter that needs to be tweaked so memory allocation is appropriate for our case)?

谢谢

spark UI对应用程序的内存使用有什么影响? [英] What is the impact of spark UI on application's memory usage?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

spark UI对应用程序的内存使用有什么影响? [英] What is the impact of spark UI on application&#39;s memory usage?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

spark UI对应用程序的内存使用有什么影响? [英] What is the impact of spark UI on application's memory usage?

登录关闭