如何在YARN中配置垄断FIFO应用程序队列? [英] How to configure monopolistic FIFO application queue in YARN?
问题描述
我需要在hadoop集群中禁用YARN应用程序的并行执行.现在,YARN具有默认设置,因此可以并行运行多个作业.我看不出这有什么好处,因为这两个作业的运行速度都较慢.
I need to disable parallel execution of YARN applications in hadoop cluster. Now, YARN has default settings, so several jobs can run in parallel. I see no advantages of this, because both jobs run slower.
我发现此设置yarn.scheduler.capacity.maximum-applications
限制了最大应用程序数,但同时影响了已提交和正在运行的应用程序(如文档所述).我想将提交的应用程序保持在队列中,直到当前正在运行的应用程序尚未完成.该怎么办?
I found this setting yarn.scheduler.capacity.maximum-applications
which limits maximum number of applications, but it affects both submitted and running apps (as stated in docs). I'd like to keep submitted apps in queue until current running application is not finished. How can this be done?
推荐答案
1)将Scheduler更改为FairScheduler
1) Change Scheduler to FairScheduler
Hadoop发行版默认情况下使用CapacityScheduler
(Cloudera使用FairScheduler
作为默认调度程序).将此属性添加到yarn-site.xml
Hadoop distributions use CapacityScheduler
by default (Cloudera uses FairScheduler
as default Scheduler). Add this property to yarn-site.xml
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
2)设置default
队列
2) Set default
Queue
Fair Scheduler为每个用户创建一个队列.即,如果三个不同的用户提交作业,则将创建三个单独的队列,并且资源将在三个队列之间共享.通过在yarn-site.xml
Fair Scheduler creates a queue per user. I.E., if three different users submit jobs then three individual queues will be created and the resources will be shared among the three queues. Disable it by adding this property in yarn-site.xml
<property>
<name>yarn.scheduler.fair.user-as-default-queue</name>
<value>false</value>
</property>
这可确保所有作业进入单个 default
队列.
This assures that all the jobs go into a single default
queue.
3)限制最大申请数
现在,作业队列已被限制为一个default
队列.将可以在该队列中运行的应用程序的最大数量限制为 1
.
Now that the job queue has been limited to one default
queue. Restrict the maximum number of applications to 1
that can be run in that queue.
在$HADOOP_CONF_DIR
下创建一个名为fair-scheduler.xml
的文件并添加这些条目
Create a file named fair-scheduler.xml
under the $HADOOP_CONF_DIR
and add these entries
<allocations>
<queueMaxAppsDefault>1</queueMaxAppsDefault>
</allocations>
此外,将此属性添加到yarn-site.xml
Also, add this property in yarn-site.xml
<property>
<name>yarn.scheduler.fair.allocation.file</name>
<value>$HADOOP_CONF_DIR/fair-scheduler.xml</value>
</property>
添加这些属性后,
重新启动YARN
服务.
Restart YARN
services after adding these properties.
在提交多个应用程序时,首先将应用程序ACCEPTED
视为活动应用程序,其余的将作为待处理应用程序排队.这些待处理的应用程序将继续处于ACCEPTED
状态,直到RUNNING
应用程序为FINISHED
.允许Active应用程序使用所有可用资源.
On submitting multiple applications, the application ACCEPTED
first will be considered as the Active application and the remaining will be queued as Pending applications. These pending applications will continue to be in ACCEPTED
state until the RUNNING
application is FINISHED
. The Active application will be allowed to utilise all the available resources.
参考: 查看全文