YARN资源管理器上的Spark:YARN容器与Spark执行器之间的关系 [英] Spark on YARN resource manager: Relation between YARN Containers and Spark Executors
问题描述
我是YARN上Spark的新手,不了解YARN Containers
和Spark Executors
之间的关系.我根据yarn-utils.py
脚本的结果尝试了以下配置,该配置可用于查找最佳群集配置.
I'm new to Spark on YARN and don't understand the relation between the YARN Containers
and the Spark Executors
. I tried out the following configuration, based on the results of the yarn-utils.py
script, that can be used to find optimal cluster configuration.
我正在使用的Hadoop集群(HDP 2.4):
The Hadoop cluster (HDP 2.4) I'm working on:
- 1个主节点:
- CPU:2个CPU,每个6个内核= 12个内核
- RAM:64 GB
- SSD:2 x 512 GB
- 1 Master Node:
- CPU: 2 CPUs with 6 cores each = 12 cores
- RAM: 64 GB
- SSD: 2 x 512 GB
- CPU:2个CPU,每个6个内核= 12个内核
- RAM:64 GB
- HDD:4 x 3 TB = 12 TB
所以我运行了
python yarn-utils.py -c 12 -m 64 -d 4 -k True
(c = cores,m = memory,d = hdds,k = hbase-installed),并得到以下结果:So I ran
python yarn-utils.py -c 12 -m 64 -d 4 -k True
(c=cores, m=memory, d=hdds, k=hbase-installed) and got the following result:Using cores=12 memory=64GB disks=4 hbase=True Profile: cores=12 memory=49152MB reserved=16GB usableMem=48GB disks=4 Num Container=8 Container Ram=6144MB Used Ram=48GB Unused Ram=16GB yarn.scheduler.minimum-allocation-mb=6144 yarn.scheduler.maximum-allocation-mb=49152 yarn.nodemanager.resource.memory-mb=49152 mapreduce.map.memory.mb=6144 mapreduce.map.java.opts=-Xmx4915m mapreduce.reduce.memory.mb=6144 mapreduce.reduce.java.opts=-Xmx4915m yarn.app.mapreduce.am.resource.mb=6144 yarn.app.mapreduce.am.command-opts=-Xmx4915m mapreduce.task.io.sort.mb=2457
这些设置是我通过Ambari界面进行的,并重新启动了集群.这些值也大致与我之前手动计算的值相符.
These settings I made via the Ambari interface and restarted the cluster. The values also match roughly what I calculated manually before.
我现在有问题
- 找到我的
spark-submit
脚本的最佳设置- 参数
--num-executors
,--executor-cores
&--executor-memory
.
- to find the optimal settings for my
spark-submit
script- parameters
--num-executors
,--executor-cores
&--executor-memory
.
但是,我发现了这篇帖子 YARN中的容器是什么? ,但这并没有真正的帮助,因为它没有描述与执行者的关系.
However, I found this post What is a container in YARN? , but this didn't really help as it doesn't describe the relation to the executors.
有人可以帮助解决一个或多个问题吗?
Can someone help to solve one or more of the questions?
推荐答案
我将在这里逐步报告我的见解:
I will report my insights here step by step:
-
首先重要的是这个事实(来源:
在YARN上运行Spark时,每个Spark执行程序都作为YARN容器运行. [...]
When running Spark on YARN, each Spark executor runs as a YARN container. [...]
-
这意味着容器的数量将始终与Spark应用程序创建的执行器相同,例如通过spark-submit中的
--num-executors
参数.每个容器始终至少分配此内存量.这意味着是否将参数
--executor-memory
设置为例如仅1g
但yarn.scheduler.minimum-allocation-mb
是6g
,该容器比Spark应用程序所需的容器大得多.Set by the
yarn.scheduler.minimum-allocation-mb
every container always allocates at least this amount of memory. This means if parameter--executor-memory
is set to e.g. only1g
butyarn.scheduler.minimum-allocation-mb
is e.g.6g
, the container is much bigger than needed by the Spark application.反之,如果参数
--executor-memory
设置为高于yarn.scheduler.minimum-allocation-mb
值,例如12g
,容器将动态分配更多内存,但仅(如果请求的内存量小于或等于yarn.scheduler.maximum-allocation-mb
值).The other way round, if the parameter
--executor-memory
is set to somthing higher than theyarn.scheduler.minimum-allocation-mb
value, e.g.12g
, the Container will allocate more memory dynamically, but only if the requested amount of memory is smaller or equal toyarn.scheduler.maximum-allocation-mb
value.yarn.nodemanager.resource.memory-mb
的值确定一台主机的所有容器可以分配多少内存!The value of
yarn.nodemanager.resource.memory-mb
determines, how much memory can be allocated in sum by all containers of one host!=> ,因此设置
yarn.scheduler.minimum-allocation-mb
允许您运行较小的容器,例如对于较小的执行者(否则会浪费内存).=> So setting
yarn.scheduler.minimum-allocation-mb
allows you to run smaller containers e.g. for smaller executors (else it would be waste of memory).=> 将
yarn.scheduler.maximum-allocation-mb
设置为最大值(例如等于yarn.nodemanager.resource.memory-mb
)可以定义更大的执行程序(如果需要,可以分配更多的内存,例如通过--executor-memory
参数)=> Setting
yarn.scheduler.maximum-allocation-mb
to the maximum value (e.g. equal toyarn.nodemanager.resource.memory-mb
) allows you to define bigger executors (more memory is allocated if needed, e.g. by--executor-memory
parameter).这篇关于YARN资源管理器上的Spark:YARN容器与Spark执行器之间的关系的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
- parameters
- 参数