容器因超出内存限制而被 YARN 杀死 [英] Container killed by YARN for exceeding memory limits
问题描述
我正在 google dataproc 中创建一个具有以下特征的集群:
I am creating a cluster in google dataproc with the following characteristics:
Master Standard (1 master, N workers)
Machine n1-highmem-2 (2 vCPU, 13.0 GB memory)
Primary disk 250 GB
Worker nodes 2
Machine type n1-highmem-2 (2 vCPU, 13.0 GB memory)
Primary disk size 250 GB
我还在 Initialization actions
中添加了来自这个 repository 以便使用 zeppelin.
I am also adding in Initialization actions
the .sh
file from this repository in order to use zeppelin.
我使用的代码可以很好地处理一些数据,但如果我使用更多的数据,则会出现以下错误:
The code that I use works fine with some data but if I use bigger amount of, I got the following error:
Container killed by YARN for exceeding memory limits. 4.0 GB of 4 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
我见过这样的帖子:容器因超出内存而被 YARN 杀死... 建议将 yarn.nodemanager.vmem-check-enabled
更改为 <代码>假代码>.
I have seen posts such as this one: Container killed by YARN for exceeding memory... where it is recommended to change yarn.nodemanager.vmem-check-enabled
to false
.
我有点困惑.当我初始化集群时是否所有这些配置都发生了?
I am a bit confused though. Are all these configurations happening when I initialize the cluster or not?
还有 yarn-site.xml
的确切位置?我无法在 master 中找到它(在 /usr/lib/zeppelin/conf/
、/usr/lib/spark/conf
、 中找不到它/usr/lib/hadoop-yar/
) 以更改它,如果更改,我需要重新启动"什么?
Also where exactly is yarn-site.xml
located? I am unable to find it in the master(cant find it in /usr/lib/zeppelin/conf/
, /usr/lib/spark/conf
, /usr/lib/hadoop-yar/
) in order to change it, and if changed what do i need to 'restart'?
推荐答案
Igor 是正确的,最简单的方法是创建一个集群并在启动服务之前指定要设置的任何其他属性.
Igor is correct, the easiest thing to do is create a cluster and specify any additional properties to set before starting the services.
然而,完全禁用 YARN 检查容器是否在其范围内有点可怕.无论哪种方式,您的虚拟机最终都会耗尽内存.
However, it's a little scary to entirely disable YARN checking that containers stay within their bounds. Either way, your VM will eventually run out of memory.
错误信息是正确的——你应该尝试提高 spark.yarn.executor.memoryOverhead
.默认为 max(384m, 0.1 * spark.executor.memory)
.在 n1-highmem-2 上,从 spark.executor.memory=3712m
开始,最终是 384m.您可以在创建集群时使用 --properties spark:spark.yarn.executor.memoryOverhead=512m
设置此值.
The error message is correct -- you should try bumping up spark.yarn.executor.memoryOverhead
. It defaults to max(384m, 0.1 * spark.executor.memory)
. On an n1-highmem-2, that ends up being 384m since spark.executor.memory=3712m
. You can set this value when creating a cluster by using --properties spark:spark.yarn.executor.memoryOverhead=512m
.
如果我理解正确,JVM 和 Spark 会尝试智能地将内存使用保持在 spark.executor.memory - memoryOverhead
内.但是,python 解释器(您的 pyspark 代码实际运行的地方)不在其会计范围内,而是属于 memoryOverhead
.如果在python进程中使用了大量内存,则需要增加memoryOverhead
.
If I understand correctly, the JVM and Spark try to be intelligent about keeping memory usage within spark.executor.memory - memoryOverhead
. However, the python interpreter (where your pyspark code actually runs) is outside their accounting, and instead falls under memoryOverhead
. If you are using a lot of memory in the python process, you will need to increase memoryOverhead
.
以下是有关 pyspark 和 Spark 内存管理的一些资源:
Here are some resources on pyspark and Spark's memory management:
- Spark 如何在 YARN 帐户上运行用于 Python 内存使用?
- https://spoddutur.github.io/spark-notes/distribution_of_executors_cores_and_memory_for_spark_application.html/a>
- http://spark.apache.org/docs/latest/tuning.html#memory-management-overview
这篇关于容器因超出内存限制而被 YARN 杀死的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!