Spark Yarn Memory配置 [英] Spark Yarn Memory configuration

查看:133
本文介绍了Spark Yarn Memory配置的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Spark应用程序,该应用程序始终因错误而失败:

I have a spark application that keeps failing on error:

诊断:容器[pid = 29328,containerID = container_e42_1512395822750_0026_02_000001]运行超出了物理内存限制.当前使用情况:使用了1.5 GB的1.5 GB物理内存;使用了2.3 GB的3.1 GB虚拟内存.正在杀死容器."

"Diagnostics: Container [pid=29328,containerID=container_e42_1512395822750_0026_02_000001] is running beyond physical memory limits. Current usage: 1.5 GB of 1.5 GB physical memory used; 2.3 GB of 3.1 GB virtual memory used. Killing container."

我看到许多建议更改以增加物理内存的不同参数.我可以对以下参数进行一些解释吗?

I saw lots of different parameters that was suggested to change to increase the physical memory. Can I please have the some explanation for the following parameters?

  • mapreduce.map.memory.mb(当前设置为0,因此假设默认值为1GB,为什么我们将其视为1.5 GB,更改它也会影响数字)

  • mapreduce.map.memory.mb (currently set to 0 so suppose to take the default which is 1GB so why we see it as 1.5 GB, changing it also dint effect the number)

mapreduce.reduce.memory.mb(当前设置为0,因此假设默认值为1GB,因此为什么我们将其视为1.5 GB,更改它也会影响数字)

mapreduce.reduce.memory.mb (currently set to 0 so suppose to take the default which is 1GB so why we see it as 1.5 GB, changing it also dint effect the number)

mapreduce.map.java.opts/mapreduce.reduce.java.opts设置为前一个数字的80%

mapreduce.map.java.opts/mapreduce.reduce.java.opts set to 80% form the previous number

yarn.scheduler.minimum-allocation-mb=1GB(更改此设置时,我会看到对最大物理内存的影响,但是对于1 GB的值,它仍然为1.5G)

yarn.scheduler.minimum-allocation-mb=1GB (when changing this then I see the effect on the max physical memory, but for the value 1 GB it still 1.5G)

yarn.app.mapreduce.am.resource.mb/spark.yarn.executor.memoryOverhead在配置中根本找不到.

yarn.app.mapreduce.am.resource.mb/spark.yarn.executor.memoryOverhead can't find at all in configuration.

我们正在使用cloudera CDH 5.12.1定义YARN(以yarn-cluster部署模​​式运行).

We are defining YARN (running with yarn-cluster deployment mode) using cloudera CDH 5.12.1.

推荐答案

spark.driver.memory
spark.executor.memory

这些控件的基本内存量将尝试为其驱动程序和所有执行程序分配内存.如果内存不足,这些可能就是您想要增加的数量.

These control the base amount of memory spark will try to allocate for it's driver and for all the executors. These are probably the ones you want to increase if you are running out of memory.

// options before Spark 2.3.0
spark.yarn.driver.memoryOverhead
spark.yarn.executor.memoryOverhead

// options after Spark 2.3.0
spark.driver.memoryOverhead
spark.executor.memoryOverhead

此值是在纱线上运行Spark时要请求的额外内存量.它旨在解决托管您的Spark Executors的纱线容器所需的额外RAM.

This value is an additional amount of memory to request when you are running Spark on yarn. It is intended to account extra RAM needed for the yarn container that is hosting your Spark Executors.

yarn.scheduler.minimum-allocation-mb
yarn.scheduler.maximum-allocation-mb

当Spark要求Yarn为执行程序保留RAM块时,它将询问基本内存加上开销内存的值.但是,Yarn可能不会将其大小恰好退还给它.这些参数控制YARN将授予的最小容器大小和最大容器大小.如果您仅将群集用于一项作业,我发现将这些值设置为非常小的值和非常大的值,然后使用上面提到的Spark内存设置来设置容器的真实大小是最容易的.

When Spark goes to ask Yarn to reserve a block of RAM for an executor, it will ask a value of the base memory plus the overhead memory. However, Yarn may not give it back one of exactly that size. These parameters control the smallest container size and the largest container size that YARN will grant. If you are only using the cluster for one job, I find it easiest to set these to very small and very large values and then using the spark memory settings mentions above to set the true container size.

mapreduce.map.memory.mb
mapreduce.map.memory.mb
mapreduce.map.java.opts/mapreduce.reduce.java.opts

我认为这些与您的Spark/Yarn工作无关.

I don't think these have any bearing on your Spark/Yarn job.

这篇关于Spark Yarn Memory配置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆