GCP Dataproc - 配置 YARN 公平调度程序 [英] GCP Dataproc - configure YARN fair scheduler

查看:21
本文介绍了GCP Dataproc - 配置 YARN 公平调度程序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图建立一个 dataproc 集群,它一次只计算一个作业(或指定的最大作业),其余的将在队列中.

I was trying to set up a dataproc cluster that would compute only one job (or specified max jobs) at a time and the rest would be in queue.

我找到了这个解决方案,如何配置垄断FIFOYARN 中的应用程序队列? ,但由于我一直在创建新集群,因此我需要将其自动化.我已将此添加到集群创建中:

I have found this solution, How to configure monopolistic FIFO application queue in YARN? , but as I'm always creating a new cluster, I needed to automatize this. I have added this to cluster creation:

"softwareConfig": {
    "properties": {
        "yarn:yarn.resourcemanager.scheduler.class":"org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler",
        "yarn:yarn.scheduler.fair.user-as-default-queue":"false",
        "yarn:yarn.scheduler.fair.allocation.file":"$HADOOP_CONF_DIR/fair-scheduler.xml",
     }
}

init 动作脚本中的另一行:

with another line in init action script:

sudo echo "<allocations><queueMaxAppsDefault>1</queueMaxAppsDefault></allocations>" > /etc/hadoop/conf/fair-scheduler.xml

当我获取它的配置时,集群告诉我这个:

and the cluster tells me this when I fetch its config:

'softwareConfig': {
  'imageVersion': '1.2.27',
  'properties': {
    'capacity-scheduler:yarn.scheduler.capacity.root.default.ordering-policy': 'fair',
    'core:fs.gs.block.size': '134217728',
    'core:fs.gs.metadata.cache.enable': 'false',
    'distcp:mapreduce.map.java.opts': '-Xmx4096m',
    'distcp:mapreduce.map.memory.mb': '5120',
    'distcp:mapreduce.reduce.java.opts': '-Xmx4096m',
    'distcp:mapreduce.reduce.memory.mb': '5120',
    'hdfs:dfs.datanode.address': '0.0.0.0:9866',
    'hdfs:dfs.datanode.http.address': '0.0.0.0:9864',
    'hdfs:dfs.datanode.https.address': '0.0.0.0:9865',
    'hdfs:dfs.datanode.ipc.address': '0.0.0.0:9867',
    'hdfs:dfs.namenode.http-address': '0.0.0.0:9870',
    'hdfs:dfs.namenode.https-address': '0.0.0.0:9871',
    'hdfs:dfs.namenode.secondary.http-address': '0.0.0.0:9868',
    'hdfs:dfs.namenode.secondary.https-address': '0.0.0.0:9869',
    'mapred-env:HADOOP_JOB_HISTORYSERVER_HEAPSIZE': '3840',
    'mapred:mapreduce.job.maps': '189',
    'mapred:mapreduce.job.reduce.slowstart.completedmaps': '0.95',
    'mapred:mapreduce.job.reduces': '63',
    'mapred:mapreduce.map.cpu.vcores': '1',
    'mapred:mapreduce.map.java.opts': '-Xmx4096m',
    'mapred:mapreduce.map.memory.mb': '5120',
    'mapred:mapreduce.reduce.cpu.vcores': '1',
    'mapred:mapreduce.reduce.java.opts': '-Xmx4096m',
    'mapred:mapreduce.reduce.memory.mb': '5120',
    'mapred:mapreduce.task.io.sort.mb': '256',
    'mapred:yarn.app.mapreduce.am.command-opts': '-Xmx4096m',
    'mapred:yarn.app.mapreduce.am.resource.cpu-vcores': '1',
    'mapred:yarn.app.mapreduce.am.resource.mb': '5120',
    'spark-env:SPARK_DAEMON_MEMORY': '3840m',
    'spark:spark.driver.maxResultSize': '1920m',
    'spark:spark.driver.memory': '3840m',
    'spark:spark.executor.cores': '8',
    'spark:spark.executor.memory': '37237m',
    'spark:spark.yarn.am.memory': '640m',
    'yarn:yarn.nodemanager.resource.memory-mb': '81920',
    'yarn:yarn.resourcemanager.scheduler.class': 'org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler',
    'yarn:yarn.scheduler.fair.allocation.file': '$HADOOP_CONF_DIR/fair-scheduler.xml',
    'yarn:yarn.scheduler.fair.user-as-default-queue': 'false',
    'yarn:yarn.scheduler.maximum-allocation-mb': '81920',
    'yarn:yarn.scheduler.minimum-allocation-mb': '1024'
  }
},

fair-scheduler.xml 文件也包含指定的代码(所有内容都在一行中,但我认为这可能不是问题)

The file fair-scheduler.xml also contains the specified code (everything is in one line, but I don't think this could be the problem)

在这一切之后,集群的行为仍然就像容量调度程序在负责一样.不知道为什么.任何建议都会有所帮助.谢谢.

推荐答案

由于init actions脚本是在集群创建后运行的,所以在脚本修改yarn-site.xml的时候yarn服务已经在运行了.

as init actions script is running after the cluster is created, the yarn service is already running in the time when the script modify the yarn-site.xml.

所以修改xml配置文件并创建其他xml文件后,需要重启yarn服务.可以使用以下命令完成:

So after modifying the xml config file and creating the other xml file, the yarn service needs to be restarted. It can be done using this command:

sudo systemctl restart hadoop-yarn-resourcemanager.service

此外,由于未设置 $HADOOP_CONF_DIR(我认为应该设置),因此需要输入文件的整个路径.但是,在那之后,初始 YARN 服务将无法启动,因为它无法找到稍后在 init 操作脚本中创建的文件.所以,我所做的是将最后几行添加到 init 操作脚本中的 yarn-site.xml 中.初始化动作脚本的代码如下:

Also, since the $HADOOP_CONF_DIR was not set (I thought it should be), its needed to input the whole path to the file. But, after that, the initial YARN service won't start, because it can't find the file that is created later in init actions script. So, what I did is to add the last few lines to yarn-site.xml in the init actions script as well. The code for init actions script is the following:

ROLE=$(/usr/share/google/get_metadata_value attributes/dataproc-role)
if [[ "${ROLE}" == 'Master' ]]; then
    echo "<allocations>" > /etc/hadoop/conf/fair-scheduler.xml
    echo "  <queueMaxAppsDefault>1</queueMaxAppsDefault>" >> /etc/hadoop/conf/fair-scheduler.xml
    echo "</allocations>" >> /etc/hadoop/conf/fair-scheduler.xml

    sed -i '$ d' /etc/hadoop/conf/yarn-site.xml

    echo "  <property>" >> /etc/hadoop/conf/yarn-site.xml
    echo "    <name>yarn.scheduler.fair.allocation.file</name>" >> /etc/hadoop/conf/yarn-site.xml
    echo "    <value>/etc/hadoop/conf/fair-scheduler.xml</value>" >> /etc/hadoop/conf/yarn-site.xml
    echo "  </property>" >> /etc/hadoop/conf/yarn-site.xml
    echo "</configuration>" >> /etc/hadoop/conf/yarn-site.xml
    systemctl restart hadoop-yarn-resourcemanager.service
fi

这篇关于GCP Dataproc - 配置 YARN 公平调度程序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆