如何在YARN帐户上运行Spark以获取Python内存使用情况? [英] How does Spark running on YARN account for Python memory usage?

查看：133 发布时间：2020/9/4 2:55:14 python apache-spark hadoop pyspark yarn

本文介绍了如何在YARN帐户上运行Spark以获取Python内存使用情况?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在阅读了文档后，我不了解Spark如何在YARN上运行以解决Python内存消耗问题.

After reading through the documentation I do not understand how does Spark running on YARN account for Python memory consumption.

它计入spark.executor.memory，spark.executor.memoryOverhead还是在哪里?

尤其是我有一个带有spark.executor.memory=25G，spark.executor.cores=4的PySpark应用程序，并且经常遇到容器因超出内存限制而被YARN杀死..它可以在相当数量的复杂Python对象上运行，因此预计会占用一些不平凡的内存，但不会占用25GB.我应该如何配置不同的内存变量以用于繁重的Python代码?

In particular I have a PySpark application with spark.executor.memory=25G, spark.executor.cores=4 and I encounter frequent Container killed by YARN for exceeding memory limits. errors when running a map on an RDD. It operates on a fairly large amount of complex Python objects so it is expected to take up some non-trivial amount of memory but not 25GB. How should I configure the different memory variables for use with heavy Python code?

Spark中的ExecutorMemoryOverhead计算:

MEMORY_OVERHEAD_FRACTION = 0.10 
MEMORY_OVERHEAD_MINIMUM = 384 
val executorMemoryOverhead = 
  max(MEMORY_OVERHEAD_FRACTION * ${spark.executor.memory}, MEMORY_OVERHEAD_MINIMUM))

对于YARN和Mesos，该属性为spark.{yarn|mesos}.executor.memoryOverhead.

The property is spark.{yarn|mesos}.executor.memoryOverhead for YARN and Mesos.

YARN杀死占用的内存比其请求的内存更多的进程，这些内存是和executorMemory的总和.

YARN kills the processes which are taking more memory than they requested which is sum of executorMemoryOverhead and executorMemory.

在给定的图像中，python进程在工作人员中使用 spark.python.worker.memory，然后 spark.yarn.executor.memoryOverhead + spark.executor.memory是特定的JVM.

In given image python processes in worker uses spark.python.worker.memory, then spark.yarn.executor.memoryOverhead + spark.executor.memory is specific JVM.

图片信用

其他资源查看全文

如何在YARN帐户上运行Spark以获取Python内存使用情况? [英] How does Spark running on YARN account for Python memory usage?

问题描述

推荐答案

Spark中的ExecutorMemoryOverhead计算:

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在YARN帐户上运行Spark以获取Python内存使用情况? [英] How does Spark running on YARN account for Python memory usage?

问题描述

推荐答案

Spark中的ExecutorMemoryOverhead计算:

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭