为什么vcore总是等于YARN上Spark中的节点数? [英] Why does vcore always equal the number of nodes in Spark on YARN?

查看：114 发布时间：2020/9/4 2:55:26 apache-spark yarn

本文介绍了为什么vcore总是等于YARN上Spark中的节点数?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个具有5个节点的Hadoop集群，每个节点具有12个核心，具有32GB内存.我将YARN用作MapReduce框架，因此我对YARN进行了以下设置:

yarn.nodemanager.resource.cpu-vcores = 10
yarn.nodemanager.resource.memory-mb = 26100

然后显示在我的YARN群集页面( http://myhost:8088/cluster/apps )上的群集指标显示 VCores总数为 40 .很好！

然后我将Spark安装在其顶部，并在yarn-client模式下使用spark-shell.

我使用以下配置运行了一个Spark作业:

-驱动程序内存20480m
-执行者内存20000m
-num-executors 4
-executor-cores 10
-conf spark.yarn.am.cores = 2
-conf spark.yarn.executor.memoryOverhead = 5600

我将-executor-cores 设置为 10 ，将-num-executors 设置为 4 ，从逻辑上讲，应该总共使用 40个Vcores .但是，当Spark作业开始运行后，当我检查同一YARN群集页面时，只有 4个使用的Vcores ，而 4个完整的Vcores

然后我将该值更改为DominantResourceCalculator.

但是当我重新启动YARN并运行相同的Spark应用程序时，我仍然得到相同的结果，说集群指标仍然告诉我们使用的VCores是4！我还使用htop命令检查了每个节点上的CPU和内存使用情况，发现没有一个节点将10个CPU内核全部用完.可能是什么原因?

我还尝试以细粒度的方式(例如，使用--num executors 40 --executor-cores 1)运行相同的Spark作业，以此方式，我再次检查了每个工作节点上的CPU状态，并且所有CPU内核都被完全占用.

解决方案

我在想同样的事情，但是更改资源计算器对于我来说是有用的.
这是我设置属性的方式:

    <property>
        <name>yarn.scheduler.capacity.resource-calculator</name>      
        <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>       
    </property>

在应用程序的YARN UI中检查分配了多少个容器和vcore，更改后的容器数量应为执行者+1，而vcore应为:((执行者核心*数量)执行者+1./p>

I have a Hadoop cluster with 5 nodes, each of which has 12 cores with 32GB memory. I use YARN as MapReduce framework, so I have the following settings with YARN:

yarn.nodemanager.resource.cpu-vcores=10
yarn.nodemanager.resource.memory-mb=26100

Then the cluster metrics shown on my YARN cluster page (http://myhost:8088/cluster/apps) displayed that VCores Total is 40. This is pretty fine!

Then I installed Spark on top of it and use spark-shell in yarn-client mode.

I ran one Spark job with the following configuration:

--driver-memory 20480m
--executor-memory 20000m
--num-executors 4
--executor-cores 10
--conf spark.yarn.am.cores=2
--conf spark.yarn.executor.memoryOverhead=5600

I set --executor-cores as 10, --num-executors as 4, so logically, there should be totally 40 Vcores Used. However, when I check the same YARN cluster page after the Spark job started running, there are only 4 Vcores Used, and 4 Vcores Total

I also found that there is a parameter in capacity-scheduler.xml - called yarn.scheduler.capacity.resource-calculator:

"The ResourceCalculator implementation to be used to compare Resources in the scheduler. The default i.e. DefaultResourceCalculator only uses Memory while DominantResourceCalculator uses dominant-resource to compare multi-dimensional resources such as Memory, CPU etc."

I then changed that value to DominantResourceCalculator.

But then when I restarted YARN and run the same Spark application, I still got the same result, say the cluster metrics still told that VCores used is 4! I also checked the CPU and memory usage on each node with htop command, I found that none of the nodes had all 10 CPU cores fully occupied. What can be the reason?

I tried also to run the same Spark job in fine-grained way, say with --num executors 40 --executor-cores 1, in this ways I checked again the CPU status on each worker node, and all CPU cores are fully occupied.

解决方案

I was wondering the same but changing the resource-calculator worked for me.
This is how I set the property:

    <property>
        <name>yarn.scheduler.capacity.resource-calculator</name>      
        <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>       
    </property>

Check in the YARN UI in the application how many containers and vcores are assigned, with the change the number of containers should be executors+1 and the vcores should be: (executor-cores*num-executors) +1.

这篇关于为什么vcore总是等于YARN上Spark中的节点数?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

为什么vcore总是等于YARN上Spark中的节点数? [英] Why does vcore always equal the number of nodes in Spark on YARN?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

为什么vcore总是等于YARN上Spark中的节点数? [英] Why does vcore always equal the number of nodes in Spark on YARN?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭