Apache Hadoop Yarn - 内核利用率不足 [英] Apache Hadoop Yarn - Underutilization of cores
问题描述
无论我如何修改 yarn-site.xml
中的设置,即使用以下所有选项
No matter how much I tinker with the settings in yarn-site.xml
i.e using all of the below options
yarn.scheduler.minimum-allocation-vcores
yarn.nodemanager.resource.memory-mb
yarn.nodemanager.resource.cpu-vcores
yarn.scheduler.maximum-allocation-mb
yarn.scheduler.maximum-allocation-vcores
我仍然无法获得我的应用程序,即 Spark 来利用集群上的所有核心.spark executor 似乎正确地占用了所有可用内存,但每个 executor 只占用一个内核,仅此而已.
i just still cannot get my application i.e Spark to utilize all the cores on the cluster. The spark executors seem to be correctly taking up all the available memory, but each executor just keeps taking a single core and no more.
这里是在spark-defaults.conf
spark.executor.cores 3
spark.executor.memory 5100m
spark.yarn.executor.memoryOverhead 800
spark.driver.memory 2g
spark.yarn.driver.memoryOverhead 400
spark.executor.instances 28
spark.reducer.maxMbInFlight 120
spark.shuffle.file.buffer.kb 200
请注意,spark.executor.cores
设置为 3,但它不起作用.我该如何解决这个问题?
Notice that spark.executor.cores
is set to 3, but it doesn't work.
How do i fix this?
推荐答案
问题不在于 yarn-site.xml
或 spark-defaults.conf
而实际上在于将内核分配给执行程序的资源计算器,或者在 MapReduce 作业的情况下,分配给 Mappers/Reducers.
The problem lies not with yarn-site.xml
or spark-defaults.conf
but actually with the resource calculator that assigns the cores to the executors or in the case of MapReduce jobs, to the Mappers/Reducers.
默认资源计算器,即 org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator
仅使用内存信息来分配容器,默认情况下不启用 CPU 调度.要同时使用内存和 CPU,需要将 capacity-scheduler.xml中的资源计算器更改为
org.apache.hadoop.yarn.util.resource.DominantResourceCalculator
代码>文件.
The default resource calculator i.e org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator
uses only memory information for allocating containers and CPU scheduling is not enabled by default. To use both memory as well as the CPU, the resource calculator needs to be changed to org.apache.hadoop.yarn.util.resource.DominantResourceCalculator
in the capacity-scheduler.xml
file.
这里是需要改变的地方.
Here's what needs to change.
<property>
<name>yarn.scheduler.capacity.resource-calculator</name>
<value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
</property>
这篇关于Apache Hadoop Yarn - 内核利用率不足的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!