Hadoop纱:如何使用Spark限制资源的动态自我分配? [英] Hadoop Yarn: How to limit dynamic self allocation of resources with Spark?

查看:112
本文介绍了Hadoop纱:如何使用Spark限制资源的动态自我分配?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Yarn下运行的Hadoop集群中,我们遇到了一个问题,即一些更聪明"的人可以通过在pySpark Jupyter笔记本中配置Spark作业来消耗大量的资源,例如:

In our Hadoop cluster that runs under Yarn we are having a problem that some "smarter" people are able to eat significantly larger chunks of resources by configuring Spark jobs in pySpark Jupyter notebooks like:

conf = (SparkConf()
        .setAppName("name")
        .setMaster("yarn-client")
        .set("spark.executor.instances", "1000")
        .set("spark.executor.memory", "64g")
        )

sc = SparkContext(conf=conf)

当这些人从字面上挤出不那么聪明"的人时,就会导致这种情况.

This leads to the situation when these people literally squeeze out others less "smarter".

是否有一种方法可以禁止用户自行分配资源,而仅将资源分配留给Yarn?

Is there a way to forbid users to self-allocate resources and leave resource allocation solely to Yarn?

推荐答案

YARN对多租户集群中按队列进行容量规划提供了很好的支持, YARN ResourceManager 使用 CapacityScheduler >默认情况下.

YARN have very good support for capacity planning in Multi-tenancy cluster by queues, YARN ResourceManager uses CapacityScheduler by default .

在这里,我们出于演示目的在Spark Submit中将队列名称命名为 alpha .

Here we are taking queue name as alpha in spark submit for demo purpose.

$ ./bin/spark-submit --class path/to/class/file \
    --master yarn-cluster \
    --queue alpha \
    jar/location \
    args

设置队列:

CapacityScheduler有一个预定义的队列,称为root.系统中的所有队列都是根队列的子级.在capacity-scheduler.xml中,参数yarn.scheduler.capacity.root.queues用于定义子队列.

CapacityScheduler has a predefined queue called root. All queues in the system are children of the root queue. In capacity-scheduler.xml, parameter yarn.scheduler.capacity.root.queues is used to define the child queues;

例如,要创建3个队列,请在以逗号分隔的列表中指定队列的名称.

for example, to create 3 queues, specify the name of the queues in a comma separated list.

<property>
    <name>yarn.scheduler.capacity.root.queues</name>
    <value>alpha,beta,default</value>
    <description>The queues at the this level (root is the root queue).</description>
</property>

这些是进行容量规划时要考虑的几个重要属性.

These are few important properties to consider for capacity planning.

<property>
    <name>yarn.scheduler.capacity.root.alpha.capacity</name>
    <value>50</value>
    <description>Queue capacity in percentage (%) as a float (e.g. 12.5). The sum of capacities for all queues, at each level, must be equal to 100. Applications in the queue may consume more resources than the queue’s capacity if there are free resources, providing elasticity.</description>
</property>

<property>
    <name>yarn.scheduler.capacity.root.alpha.maximum-capacity</name>
    <value>80</value>
    <description>Maximum queue capacity in percentage (%) as a float. This limits the elasticity for applications in the queue. Defaults to -1 which disables it.</description>
</property>

<property>
    <name>yarn.scheduler.capacity.root.alpha.minimum-capacity</name>
    <value>10</value>
    <description>Each queue enforces a limit on the percentage of resources allocated to a user at any given time, if there is demand for resources. The user limit can vary between a minimum and maximum value. The former (the minimum value) is set to this property value and the latter (the maximum value) depends on the number of users who have submitted applications. For e.g., suppose the value of this property is 25. If two users have submitted applications to a queue, no single user can use more than 50% of the queue resources. If a third user submits an application, no single user can use more than 33% of the queue resources. With 4 or more users, no user can use more than 25% of the queues resources. A value of 100 implies no user limits are imposed. The default is 100. Value is specified as a integer.</description>
</property>

链接: YARN CapacityScheduler队列属性

这篇关于Hadoop纱:如何使用Spark限制资源的动态自我分配?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆