YARN杀死的容器超出了内存限制 [英] Container killed by YARN for exceeding memory limits

查看:905
本文介绍了YARN杀死的容器超出了内存限制的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在google dataproc中创建具有以下特征的集群:

I am creating a cluster in google dataproc with the following characteristics:

Master Standard (1 master, N workers)
  Machine       n1-highmem-2 (2 vCPU, 13.0 GB memory)
  Primary disk  250 GB

Worker nodes    2
  Machine type  n1-highmem-2 (2 vCPU, 13.0 GB memory)
  Primary disk  size    250 GB

我还从Initialization actions文件中添加Initialization actions >存储库以使用齐柏林飞艇.

I am also adding in Initialization actions the .sh file from this repository in order to use zeppelin.

我使用的代码可以很好地处理某些数据,但是如果使用大量代码,则会出现以下错误:

The code that I use works fine with some data but if I use bigger amount of, I got the following error:

Container killed by YARN for exceeding memory limits. 4.0 GB of 4 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.

我看过这样的帖子:

I have seen posts such as this one: Container killed by YARN for exceeding memory... where it is recommended to change yarn.nodemanager.vmem-check-enabled to false.

我有点困惑.是否在初始化集群时进行所有这些配置?

I am a bit confused though. Are all these configurations happening when I initialize the cluster or not?

yarn-site.xml的确切位置也位于哪里?我无法在母版中找到它(无法在/usr/lib/zeppelin/conf//usr/lib/spark/conf/usr/lib/hadoop-yar/中找到它)以进行更改,并且如果更改了,我需要重新启动"吗?

Also where exactly is yarn-site.xml located? I am unable to find it in the master(cant find it in /usr/lib/zeppelin/conf/, /usr/lib/spark/conf, /usr/lib/hadoop-yar/) in order to change it, and if changed what do i need to 'restart'?

推荐答案

Igor是正确的,最简单的方法是创建一个集群并指定要在启动服务之前设置的任何其他属性.

Igor is correct, the easiest thing to do is create a cluster and specify any additional properties to set before starting the services.

但是,完全禁用YARN检查容器是否在其范围内有点令人恐惧.无论哪种方式,您的VM最终都会耗尽内存.

However, it's a little scary to entirely disable YARN checking that containers stay within their bounds. Either way, your VM will eventually run out of memory.

错误消息是正确的-您应该尝试增大spark.yarn.executor.memoryOverhead.默认为max(384m, 0.1 * spark.executor.memory).在n1-highmem-2上,从spark.executor.memory=3712m开始最终为384m.您可以使用--properties spark:spark.yarn.executor.memoryOverhead=512m设置群集时设置此值.

The error message is correct -- you should try bumping up spark.yarn.executor.memoryOverhead. It defaults to max(384m, 0.1 * spark.executor.memory). On an n1-highmem-2, that ends up being 384m since spark.executor.memory=3712m. You can set this value when creating a cluster by using --properties spark:spark.yarn.executor.memoryOverhead=512m.

如果我理解正确,那么JVM和Spark会尝试明智地将内存使用率保持在spark.executor.memory - memoryOverhead之内.但是,python解释器(实际运行pyspark代码的地方)不在其考虑范围之内,而是位于memoryOverhead下.如果您在python进程中使用了大量内存,则需要增加memoryOverhead.

If I understand correctly, the JVM and Spark try to be intelligent about keeping memory usage within spark.executor.memory - memoryOverhead. However, the python interpreter (where your pyspark code actually runs) is outside their accounting, and instead falls under memoryOverhead. If you are using a lot of memory in the python process, you will need to increase memoryOverhead.

以下是有关pyspark和Spark的内存管理的一些资源:

Here are some resources on pyspark and Spark's memory management:

  • How does Spark running on YARN account for Python memory usage?
  • https://spoddutur.github.io/spark-notes/distribution_of_executors_cores_and_memory_for_spark_application.html
  • http://spark.apache.org/docs/latest/tuning.html#memory-management-overview

这篇关于YARN杀死的容器超出了内存限制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆