如何限制Yarn下Hadoop集群中资源的动态自我分配？ [英] How to limit dynamic self allocation of resources in Hadoop cluster under Yarn?

查看：181 发布时间：2018/5/31 18:56:25 hadoop apache-spark pyspark yarn

本文介绍了如何限制Yarn下Hadoop集群中资源的动态自我分配？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在我们的Yarn下运行的Hadoop集群中，我们遇到了一些问题，一些更聪明的人可以通过在pySpark Jupyter笔记本中配置Spark作业来消耗更多的资源。

  conf =（SparkConf（）
 .setAppName（name）
 .setMaster（yarn-client）
。 set（spark.executor.instances，1000）
 .set（spark.executor.memory，64g）
）
 
 sc = SparkContext（ conf = conf）

这导致了这种情况，这些人从字面上压缩了别人的智慧。

有没有办法禁止用户自行分配资源并将资源分配仅留给Yarn？

解决方案

YARN通过队列对多租户群集中的容量规划提供了很好的支持， YARN ResourceManager 默认使用 CapacityScheduler 。

这里我们是t作为 alpha 中的ake队列名称，以供演示用途。

  $ ./bin/spark-提交 - 类路径/到/类/文件\ 
  - 主线群集\ 
  - 字符alpha \ 
 jar /位置\ 
 args

建立队列：

CapacityScheduler具有一个名为root的预定义队列。系统中的所有队列都是根队列的子项。在 capacity-scheduler.xml 中，参数 yarn.scheduler.capacity.root.queues 用于定义子队列;例如，要创建3个队列，请在逗号分隔列表中指定队列的名称。

 <属性> 
< name> yarn.scheduler.capacity.root.queues< / name> 
<值> alpha，beta，默认值< /值> 
< description>此级别的队列（root是根队列）。< / description> 
< / property>

这些是容量规划需要考虑的几个重要属性。

 <属性> 
< name> yarn.scheduler.capacity.root.alpha.capacity< / name> 
<值> 50< /值> 
< description>以百分比形式排队容量（％）（例如12.5）。在每个级别上，所有队列的容量总和必须等于100.如果有空闲资源，队列中的应用程序可能消耗比队列容量更多的资源，从而提供弹性。< / description> 
< / property> 
 
<属性> 
< name> yarn.scheduler.capacity.root.alpha.maximum-capacity< / name> 
<值> 80< /值> 
< description>以百分比（％）表示的最大队列容量，以浮点形式表示。这限制了队列中应用程序的弹性。默认为-1，禁用它。< / description> 
< / property> 
 
<属性> 
< name> yarn.scheduler.capacity.root.alpha.minimum-capacity< / name> 
< value> 10< /值> 
< description>如果存在对资源的需求，则每个队列对在任何给定时间分配给用户的资源的百分比实施限制。用户限制可以在最小值和最大值之间变化。前者（最小值）设置为该属性值，后者（最大值）取决于已提交应用程序的用户数量。例如，假设此属性的值为25.如果两个用户已将应用程序提交到队列，则没有单个用户可以使用超过50％的队列资源。如果第三个用户提交应用程序，则单个用户不能使用超过33％的队列资源。拥有4个或更多用户，没有用户可以使用超过25％的队列资源。值为100意味着不施加用户限制。默认值为100.值指定为整数。< / description> 
< / property>

链接： YARN CapacityScheduler队列属性

In our Hadoop cluster that runs under Yarn we are having a problem that some "smarter" people are able to eat significantly larger chunks of resources by configuring Spark jobs in pySpark Jupyter notebooks like:

conf = (SparkConf()
        .setAppName("name")
        .setMaster("yarn-client")
        .set("spark.executor.instances", "1000")
        .set("spark.executor.memory", "64g")
        )

sc = SparkContext(conf=conf)

This leads to the situation when these people literally squeeze out others less "smarter".

Is there a way to forbid users to self-allocate resources and leave resource allocation solely to Yarn?

解决方案

YARN have very good support for capacity planning in Multi-tenancy cluster by queues, YARN ResourceManager uses CapacityScheduler by default .

Here we are taking queue name as alpha in spark submit for demo purpose.

$ ./bin/spark-submit --class path/to/class/file \
    --master yarn-cluster \
    --queue alpha \
    jar/location \
    args

Setup the queues:

CapacityScheduler has a predefined queue called root. All queues in the system are children of the root queue. In capacity-scheduler.xml, parameter yarn.scheduler.capacity.root.queues is used to define the child queues;

for example, to create 3 queues, specify the name of the queues in a comma separated list.

<property>
    <name>yarn.scheduler.capacity.root.queues</name>
    <value>alpha,beta,default</value>
    <description>The queues at the this level (root is the root queue).</description>
</property>

These are few important properties to consider for capacity planning.

<property>
    <name>yarn.scheduler.capacity.root.alpha.capacity</name>
    <value>50</value>
    <description>Queue capacity in percentage (%) as a float (e.g. 12.5). The sum of capacities for all queues, at each level, must be equal to 100. Applications in the queue may consume more resources than the queue’s capacity if there are free resources, providing elasticity.</description>
</property>

<property>
    <name>yarn.scheduler.capacity.root.alpha.maximum-capacity</name>
    <value>80</value>
    <description>Maximum queue capacity in percentage (%) as a float. This limits the elasticity for applications in the queue. Defaults to -1 which disables it.</description>
</property>

<property>
    <name>yarn.scheduler.capacity.root.alpha.minimum-capacity</name>
    <value>10</value>
    <description>Each queue enforces a limit on the percentage of resources allocated to a user at any given time, if there is demand for resources. The user limit can vary between a minimum and maximum value. The former (the minimum value) is set to this property value and the latter (the maximum value) depends on the number of users who have submitted applications. For e.g., suppose the value of this property is 25. If two users have submitted applications to a queue, no single user can use more than 50% of the queue resources. If a third user submits an application, no single user can use more than 33% of the queue resources. With 4 or more users, no user can use more than 25% of the queues resources. A value of 100 implies no user limits are imposed. The default is 100. Value is specified as a integer.</description>
</property>

links : YARN CapacityScheduler Queue Properties

这篇关于如何限制Yarn下Hadoop集群中资源的动态自我分配？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何限制Yarn下Hadoop集群中资源的动态自我分配？ [英] How to limit dynamic self allocation of resources in Hadoop cluster under Yarn?

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

如何限制Yarn下Hadoop集群中资源的动态自我分配？ [英] How to limit dynamic self allocation of resources in Hadoop cluster under Yarn?

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭