YARN在运行Spark作业时不会基于公平份额抢占资源 [英] YARN not preempting resources based on fair shares when running a Spark job

查看:94
本文介绍了YARN在运行Spark作业时不会基于公平份额抢占资源的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在YARN 公平预定队列.

对于测试,我已经将Hadoop 2.6(也尝试2.7)配置为在伪分布式模式下与MacOS上的本地HDFS一起运行.对于作业提交,使用了来自 Spark网站的针对Hadoop 2.6及更高版本的Pre-build Spark 1.4"(也尝试过1.5)分发.

For the tests I've configured Hadoop 2.6 (tried 2.7 also) to run in pseudo-distributed mode with local HDFS on MacOS. For job submission used "Pre-build Spark 1.4 for Hadoop 2.6 and later" (tried 1.5 also) distribution from Spark's website.

使用Hadoop MapReduce作业的基本配置进行测试时,Fair Scheduler会按预期工作:当集群的资源超过某个最大值时,将计算公平份额,并根据这些计算来抢占和平衡不同队列中作业的资源.

When tested with basic configuration on Hadoop MapReduce jobs, Fair Scheduler works as expected: When resources of the cluster exceed some maximum, fair shares are calculated and resources for jobs in different queues are preempted and balanced based on these calculations.

对于Spark作业也进行了相同的测试,在这种情况下,YARN会为每个作业正确计算公平份额,但不会重新平衡Spark容器的资源.

The same test is ran with Spark jobs, in that case YARN is making correct calculations of the fair shares for each job, but resources for Spark containers are not re-balanced.

这是我的conf文件:

Here are my conf files:

$ HADOOP_HOME/etc/hadoop/yarn-site.xml

$HADOOP_HOME/etc/hadoop/yarn-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
   <property>
      <name>yarn.nodemanager.aux-services</name>
      <value>mapreduce_shuffle</value>
   </property>
   <property>
      <name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
      <value>org.apache.spark.network.yarn.YarnShuffleService</value>
   </property>
   <property>
      <name>yarn.resourcemanager.scheduler.class</name>
      <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
   </property>
   <property>
      <name>yarn.scheduler.fair.preemption</name>
      <value>true</value>
   </property>
</configuration>

$ HADOOP_HOME/etc/hadoop/fair-scheduler.xml

$HADOOP_HOME/etc/hadoop/fair-scheduler.xml

<?xml version="1.0" encoding="UTF-8"?>
<allocations>
   <defaultQueueSchedulingPolicy>fair</defaultQueueSchedulingPolicy>
   <queue name="prod">
      <weight>40</weight>
      <schedulingPolicy>fifo</schedulingPolicy>
   </queue>
   <queue name="dev">
      <weight>60</weight>
      <queue name="eng" />
      <queue name="science" />
   </queue>
   <queuePlacementPolicy>
      <rule name="specified" create="false" />
      <rule name="primaryGroup" create="false" />
      <rule name="default" queue="dev.eng" />
   </queuePlacementPolicy>
</allocations>

$ HADOOP_HOME/etc/hadoop/core-site.xml

$HADOOP_HOME/etc/hadoop/core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
   <property>
      <name>fs.defaultFS</name>
      <value>hdfs://localhost:9000</value>
   </property>
</configuration>

$ HADOOP_HOME/etc/hadoop/core-site.xml

$HADOOP_HOME/etc/hadoop/core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
   <property>
      <name>dfs.replication</name>
      <value>1</value>
   </property>
</configuration>

测试用例是:

在权重为40的生产"队列上运行作业(必须分配所有资源的40%),正如预期的那样,该作业将占用所有必需的可用资源(集群资源的62.5%).

Run a job on the "prod" queue with weight 40 (must allocate 40% of all resources), as expected the job takes all needed free resources (62,5% of the clusters resources).

./bin/spark-submit --class org.apache.spark.examples.SparkPi \
--master yarn-cluster \
--driver-memory 512M \
--executor-memory 768M \
--executor-cores 1 \
--num-executors 2 \
--queue prod \
lib/spark-examples*.jar 100000

之后,在权重为60的"dev.eng"队列上运行相同的作业,这意味着该作业必须分配所有资源的60%,并将第一个作业的资源减少到40%.

After that run the same job on the "dev.eng" queue with weight 60, that mean the job must allocate 60% of all resources and decrease the first job's resources to ~40%.

./bin/spark-submit --class org.apache.spark.examples.SparkPi \
--master yarn-cluster \
--driver-memory 512M \
--executor-memory 768M \
--executor-cores 1 \
--num-executors 2 \
--queue dev.eng \
lib/spark-examples*.jar 100000

不幸的是,群集资源没有改变-第一份工作的62.5%和第二份工作的37.5%.

Unfortunately, cluster resources are not changing - 62,5% for the first job and 37,5% for second.

推荐答案

您需要在分配xml中设置抢占超时之一.一个用于最小份额,一个用于公平份额,都在几秒钟内.默认情况下,未设置超时.

You need to set one of the preemption timeouts in your allocation xml. One for minimum share and one for fair share, both are in seconds. By default, the timeouts are not set.

来自Hadoop:权威指南第4版

From Hadoop: The Definitive Guide 4th Edition

如果队列在等待到其最小份额抢占超时时间之久而未收到其最小保证份额,则调度程序可能会 抢占其他容器.为所有队列设置默认超时 通过defaultMinSharePreemptionTimeout顶级元素 分配文件,并通过设置每个队列 队列的minSharePreemptionTimeout元素.

If a queue waits for as long as its minimum share preemption timeout without receiving its minimum guaranteed share, then the scheduler may preempt other containers. The default timeout is set for all queues via the defaultMinSharePreemptionTimeout top-level element in the allocation file, and on a per-queue basis by setting the minSharePreemptionTimeout element for a queue.

同样,如果队列在很短时间内仍低于其公平份额的一半 由于公平份额抢占超时,则调度程序可以抢占 其他容器.通过设置所有队列的默认超时时间 分配中的defaultFairSharePreemptionTimeout顶级元素 文件,并通过设置fairSharePreemptionTimeout在每个队列的基础上 在队列中.阈值也可以从其默认值0.5更改 通过设置defaultFairSharePreemptionThreshold和 fairSharePreemptionThreshold(按队列).

Likewise, if a queue remains below half of its fair share for as long as the fair share preemption timeout, then the scheduler may preempt other containers. The default timeout is set for all queues via the defaultFairSharePreemptionTimeout top-level element in the allocation file, and on a per-queue basis by setting fairSharePreemptionTimeout on a queue. The threshold may also be changed from its default of 0.5 by setting defaultFairSharePreemptionThreshold and fairSharePreemptionThreshold (per-queue).

这篇关于YARN在运行Spark作业时不会基于公平份额抢占资源的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆