Apache Hadoop Yarn - 内核利用率不足 [英] Apache Hadoop Yarn - Underutilization of cores

查看:35
本文介绍了Apache Hadoop Yarn - 内核利用率不足的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

无论我如何修改 yarn-site.xml 中的设置,即使用以下所有选项

No matter how much I tinker with the settings in yarn-site.xml i.e using all of the below options

yarn.scheduler.minimum-allocation-vcores
yarn.nodemanager.resource.memory-mb
yarn.nodemanager.resource.cpu-vcores
yarn.scheduler.maximum-allocation-mb
yarn.scheduler.maximum-allocation-vcores

我仍然无法获得我的应用程序,即 Spark 来利用集群上的所有核心.spark executor 似乎正确地占用了所有可用内存,但每个 executor 只占用一个内核,仅此而已.

i just still cannot get my application i.e Spark to utilize all the cores on the cluster. The spark executors seem to be correctly taking up all the available memory, but each executor just keeps taking a single core and no more.

这里是在spark-defaults.conf

spark.executor.cores                    3
spark.executor.memory                   5100m
spark.yarn.executor.memoryOverhead      800
spark.driver.memory                     2g
spark.yarn.driver.memoryOverhead        400
spark.executor.instances                28
spark.reducer.maxMbInFlight             120
spark.shuffle.file.buffer.kb            200

请注意,spark.executor.cores 设置为 3,但它不起作用.我该如何解决这个问题?

Notice that spark.executor.cores is set to 3, but it doesn't work. How do i fix this?

推荐答案

问题不在于 yarn-site.xmlspark-defaults.conf 而实际上在于将内核分配给执行程序的资源计算器,或者在 MapReduce 作业的情况下,分配给 Mappers/Reducers.

The problem lies not with yarn-site.xml or spark-defaults.conf but actually with the resource calculator that assigns the cores to the executors or in the case of MapReduce jobs, to the Mappers/Reducers.

默认资源计算器,即 org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator 仅使用内存信息来分配容器,默认情况下不启用 CPU 调度.要同时使用内存和 CPU,需要将 capacity-scheduler.xmlorg.apache.hadoop.yarn.util.resource.DominantResourceCalculator代码>文件.

The default resource calculator i.e org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator uses only memory information for allocating containers and CPU scheduling is not enabled by default. To use both memory as well as the CPU, the resource calculator needs to be changed to org.apache.hadoop.yarn.util.resource.DominantResourceCalculator in the capacity-scheduler.xml file.

这里是需要改变的地方.

Here's what needs to change.

<property>
    <name>yarn.scheduler.capacity.resource-calculator</name>
    <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
</property>

这篇关于Apache Hadoop Yarn - 内核利用率不足的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆