MapReduce 作业挂起,等待分配 AM 容器 [英] MapReduce job hangs, waiting for AM container to be allocated

查看:34
本文介绍了MapReduce 作业挂起,等待分配 AM 容器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试将简单的字数统计作为 MapReduce 作业运行.在本地运行时一切正常(所有工作都在名称节点上完成).但是,当我尝试使用 YARN(将 mapreduce.framework.name=yarn 添加到 mapred-site.conf)在集群上运行它时,作业挂起.

我在这里遇到了类似的问题:

可能是什么问题?

我在机器上尝试了这个配置(评论):NameNode(8GB RAM) + 2x DataNode (4GB RAM).我得到了同样的效果:作业挂在 ACCEPTED 状态.

将配置(感谢@Manjunath Ballur)更改为:

yarn-site.xml:

<预><代码><配置><财产><name>yarn.resourcemanager.hostname</name><value>hadoop-droplet</value></属性><财产><name>yarn.resourcemanager.resource-tracker.address</name><value>hadoop-droplet:8031</value></属性><财产><name>yarn.resourcemanager.address</name><value>hadoop-droplet:8032</value></属性><财产><name>yarn.resourcemanager.scheduler.address</name><value>hadoop-droplet:8030</value></属性><财产><name>yarn.resourcemanager.admin.address</name><value>hadoop-droplet:8033</value></属性><财产><name>yarn.resourcemanager.webapp.address</name><value>hadoop-droplet:8088</value></属性><财产><description>典型应用程序的类路径.</description><name>yarn.application.classpath</name><价值>$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$YARN_HOME/*,$YARN_HOME/lib/*</属性><财产><name>yarn.nodemanager.aux-services</name><value>mapreduce.shuffle</value></属性><财产><name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name><value>org.apache.hadoop.mapred.ShuffleHandler</value></属性><财产><name>yarn.nodemanager.local-dirs</name><value>/data/1/yarn/local,/data/2/yarn/local,/data/3/yarn/local</value></属性><财产><name>yarn.nodemanager.log-dirs</name><value>/data/1/yarn/logs,/data/2/yarn/logs,/data/3/yarn/logs</value></属性><财产><description>在哪里聚合日志</description><name>yarn.nodemanager.remote-app-log-dir</name><value>/var/log/hadoop-yarn/apps</value></属性><财产><name>yarn.scheduler.minimum-allocation-mb</name><值>50</值></属性><财产><name>yarn.scheduler.maximum-allocation-mb</name><值>390</值></属性><财产><name>yarn.nodemanager.resource.memory-mb</name><值>390</值></属性></配置>

mapred-site.xml:

<预><代码><配置><财产><name>mapreduce.framework.name</name><value>纱线</value></属性><财产><name>yarn.app.mapreduce.am.resource.mb</name><值>50</值></属性><财产><name>yarn.app.mapreduce.am.command-opts</name><value>-Xmx40m</value></属性><财产><name>mapreduce.map.memory.mb</name><值>50</值></属性><财产><name>mapreduce.reduce.memory.mb</name><值>50</值></属性><财产><name>mapreduce.map.java.opts</name><value>-Xmx40m</value></属性><财产><name>mapreduce.reduce.java.opts</name><value>-Xmx40m</value></属性></配置>

还是不行.附加信息:我在集群预览中看不到任何节点(这里有类似的问题:

解决方案

您应该检查集群中节点管理器的状态.如果 NM 节点磁盘空间不足,则 RM 会将它们标记为不健康",并且这些 NM 无法分配新容器.

1) 检查 Unhealthy 节点:http://:8088/cluster/nodes/unhealthy

如果健康报告"选项卡显示本地目录不好",则意味着您需要从这些节点清理一些磁盘空间.

2) 检查 hdfs-site.xml 中的 DFS dfs.data.dir 属性.它指向本地文件系统上存储 hdfs 数据的位置.

3) 登录这些机器并使用 df -h &hadoop fs - du -h 测量占用空间的命令.

4) 验证 hadoop 垃圾箱并在它阻止您时将其删除.hadoop fs -du -h/user/user_name/.Trashhadoop fs -rm -r/user/user_name/.Trash/*

I tried to run simple word count as MapReduce job. Everything works fine when run locally (all work done on Name Node). But, when I try to run it on a cluster using YARN (adding mapreduce.framework.name=yarn to mapred-site.conf) job hangs.

I came across a similar problem here: MapReduce jobs get stuck in Accepted state

Output from job:

*** START ***
15/12/25 17:52:50 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/12/25 17:52:51 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
15/12/25 17:52:51 INFO input.FileInputFormat: Total input paths to process : 5
15/12/25 17:52:52 INFO mapreduce.JobSubmitter: number of splits:5
15/12/25 17:52:52 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1451083949804_0001
15/12/25 17:52:53 INFO impl.YarnClientImpl: Submitted application application_1451083949804_0001
15/12/25 17:52:53 INFO mapreduce.Job: The url to track the job: http://hadoop-droplet:8088/proxy/application_1451083949804_0001/
15/12/25 17:52:53 INFO mapreduce.Job: Running job: job_1451083949804_0001

mapred-site.xml:

<configuration>

<property>
   <name>mapreduce.framework.name</name>
   <value>yarn</value>
</property>

<property>
   <name>mapreduce.job.tracker</name>
   <value>localhost:54311</value>
</property> 

<!--
<property>
   <name>mapreduce.job.tracker.reserved.physicalmemory.mb</name>
   <value></value>
</property>

<property>
   <name>mapreduce.map.memory.mb</name>
   <value>1024</value>
</property>

<property>
   <name>mapreduce.reduce.memory.mb</name>
   <value>2048</value>
</property>    

<property>
   <name>yarn.app.mapreduce.am.resource.mb</name>
   <value>3000</value>
   <source>mapred-site.xml</source>
</property> -->

</configuration>

yarn-site.xml

<configuration>
 <property>
   <name>yarn.nodemanager.aux-services</name>
   <value>mapreduce_shuffle</value>
 </property>
 <property>
   <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
   <value>org.apache.hadoop.mapred.ShuffleHandler</value>
 </property>

<!--
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>3000</value>
<source>yarn-site.xml</source>
</property>

<property>
  <name>yarn.scheduler.minimum-allocation-mb</name>
  <value>500</value>
</property>

<property>
<name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
<value>3000</value>
</property>
-->

</configuration>

//I the left commented options - they were not solving the problem

YarnApplicationState: ACCEPTED: waiting for AM container to be allocated, launched and register with RM.

What can be the problem?

EDIT:

I tried this configuration (commented) on machines: NameNode(8GB RAM) + 2x DataNode (4GB RAM). I get the same effect: Job hangs on ACCEPTED state.

EDIT2: changed configuration (thanks @Manjunath Ballur) to:

yarn-site.xml:

<configuration>
  <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>hadoop-droplet</value>
  </property>

  <property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>hadoop-droplet:8031</value>
  </property>
  <property>
    <name>yarn.resourcemanager.address</name>
    <value>hadoop-droplet:8032</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>hadoop-droplet:8030</value>
  </property>
  <property>
    <name>yarn.resourcemanager.admin.address</name>
    <value>hadoop-droplet:8033</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>hadoop-droplet:8088</value>
  </property>
  <property>
    <description>Classpath for typical applications.</description>
    <name>yarn.application.classpath</name>
    <value>
        $HADOOP_CONF_DIR,
        $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
        $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
        $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
        $YARN_HOME/*,$YARN_HOME/lib/*
    </value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce.shuffle</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>
  <property>
    <name>yarn.nodemanager.local-dirs</name>
    <value>/data/1/yarn/local,/data/2/yarn/local,/data/3/yarn/local</value>
  </property>
  <property>
    <name>yarn.nodemanager.log-dirs</name>
    <value>/data/1/yarn/logs,/data/2/yarn/logs,/data/3/yarn/logs</value>
  </property>
  <property>
    <description>Where to aggregate logs</description>
    <name>yarn.nodemanager.remote-app-log-dir</name>
    <value>/var/log/hadoop-yarn/apps</value>
  </property>
  <property> 
    <name>yarn.scheduler.minimum-allocation-mb</name> 
    <value>50</value>
  </property>
  <property> 
    <name>yarn.scheduler.maximum-allocation-mb</name> 
    <value>390</value>
  </property>
  <property> 
    <name>yarn.nodemanager.resource.memory-mb</name> 
    <value>390</value>
  </property>
</configuration>

mapred-site.xml:

<configuration>
<property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
</property>

<property>  
    <name>yarn.app.mapreduce.am.resource.mb</name>  
    <value>50</value>
</property>
<property> 
    <name>yarn.app.mapreduce.am.command-opts</name> 
    <value>-Xmx40m</value>
</property>
<property>
    <name>mapreduce.map.memory.mb</name>
    <value>50</value>
</property>
<property>
    <name>mapreduce.reduce.memory.mb</name>
    <value>50</value>
</property>
<property>
    <name>mapreduce.map.java.opts</name>
    <value>-Xmx40m</value>
</property>
<property>
    <name>mapreduce.reduce.java.opts</name>
    <value>-Xmx40m</value>
</property>
</configuration>

Still not working. Additional info: I can see no nodes on cluster preview (similar problem here: Slave nodes not in Yarn ResourceManager )

解决方案

You should check the status of Node managers in your cluster. If the NM nodes are short on disk space then RM will mark them "unhealthy" and those NMs can't allocate new containers.

1) Check the Unhealthy nodes: http://<active_RM>:8088/cluster/nodes/unhealthy

If the "health report" tab says "local-dirs are bad" then it means you need to cleanup some disk space from these nodes.

2) Check the DFS dfs.data.dir property in hdfs-site.xml. It points the location on local file system where hdfs data is stored.

3) Login to those machines and use df -h & hadoop fs - du -h commands to measure the space occupied.

4) Verify hadoop trash and delete it if it's blocking you. hadoop fs -du -h /user/user_name/.Trash and hadoop fs -rm -r /user/user_name/.Trash/*

这篇关于MapReduce 作业挂起,等待分配 AM 容器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆