Hadoop 1.2.1 - 多节点群集 - Wordcount程序中的Reducer阶段挂起? [英] Hadoop 1.2.1 - multinode cluster - Reducer phase hangs for Wordcount program?

查看:106
本文介绍了Hadoop 1.2.1 - 多节点群集 - Wordcount程序中的Reducer阶段挂起?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题在这里可能听起来有点多余,但对早期问题的解决方案都是特设的。很少有我尝试过,但没有运气。



实际上,我正在研究hadoop-1.2.1(在ubuntu 14上),最初我有单节点设置,然后我跑了 WordCount 计划成功完成。然后根据这个教程添加一个节点。它成功启动,没有任何错误,但是现在当我运行相同的WordCount程序时,它正处于缩小阶段。我查看了任务跟踪器日志,它们如下所示: - $ / $>

  INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction(registerTask):attempt_201509110037_0001_m_000002_0任务状态:UNASSIGNED 
INFO org.apache.hadoop.mapred.TaskTracker:尝试启动:attempt_201509110037_0001_m_000002_0需要1个插槽
INFO org.apache.hadoop.mapred.TaskTracker:In TaskLauncher,当前空闲插槽:2并尝试启动需要1个插槽
INFO的attempt_201509110037_0001_m_000002_0 org.apache.hadoop.mapred.JobLocalizer:在此TT上初始化用户hadoopuser。
INFO org.apache.hadoop.mapred.JvmManager:在JvmRunner中构建的JVM ID:jvm_201509110037_0001_m_18975496
INFO org.apache.hadoop.mapred.JvmManager:JVM运行器jvm_201509110037_0001_m_18975496衍生。
INFO org.apache.hadoop.mapred.TaskController:将命令写入/app/hadoop/tmp/mapred/local/ttprivate/taskTracker/hadoopuser/jobcache/job_201509110037_0001/attempt_201509110037_0001_m_000002_0/taskjvm.sh
INFO org .apache.hadoop.mapred.TaskTracker:JVM ID:jvm_201509110037_0001_m_18975496给出的任务:attempt_201509110037_0001_m_000002_0
INFO org.apache.hadoop.mapred.TaskTracker:attempt_201509110037_0001_m_000002_0 0.0%hdfs:// HadoopMaster:54310 / input / file02:25+ 3
INFO org.apache.hadoop.mapred.TaskTracker:完成任务attempt_201509110037_0001_m_000002_0。
INFO org.apache.hadoop.mapred.TaskTracker:报告的输出大小为attempt_201509110037_0001_m_000002_0为6
INFO org.apache.hadoop.mapred.TaskTracker:addFreeSlot:当前空闲插槽:2
INFO组织.apache.hadoop.mapred.JvmManager:JVM:jvm_201509110037_0001_m_18975496退出,退出代码为0.它运行的任务数量:1
INFO org.apache.hadoop.mapred.TaskTracker:LaunchTaskAction(registerTask):attempt_201509110037_0001_r_000000_0任务状态:UNASSIGNED
INFO org.apache.hadoop.mapred.TaskTracker:尝试启动:attempt_201509110037_0001_r_000000_0需要1个插槽
INFO org.apache.hadoop.mapred.TaskTracker:在TaskLauncher中,当前空闲插槽:2并尝试启动attempt_201509110037_0001_r_000000_0需要1个插槽
INFO org.apache.hadoop.io.nativeio.NativeIO:初始化UID到用户映射的缓存,缓存超时为14400秒。
INFO org.apache.hadoop.io.nativeio.NativeIO:从本地实现获得UID为10的UserName hadoopuser
INFO org.apache.hadoop.mapred.JvmManager:在JvmRunner中构造的JVM ID:jvm_201509110037_0001_r_18975496
INFO org.apache.hadoop.mapred.JvmManager:JVM运行器jvm_201509110037_0001_r_18975496衍生。
INFO org.apache.hadoop.mapred.TaskController:将命令写入/app/hadoop/tmp/mapred/local/ttprivate/taskTracker/hadoopuser/jobcache/job_201509110037_0001/attempt_201509110037_0001_r_000000_0/taskjvm.sh
INFO org .apache.hadoop.mapred.TaskTracker:JVM ID:jvm_201509110037_0001_r_18975496给出的任务:attempt_201509110037_0001_r_000000_0
INFO org.apache.hadoop.mapred.TaskTracker.clienttrace:src:127.0.1.1:500,dest:127.0.0.1:55946 ,字节:6,操作:MAPRED_SHUFFLE,cliID:attempt_201509110037_0001_m_000002_0,持续时间:7129894
INFO org.apache.hadoop.mapred.TaskTracker:attempt_201509110037_0001_r_000000_0 0.11111112%reduce>复制(3中的1个为0.00MB / s)>
INFO org.apache.hadoop.mapred.TaskTracker:attempt_201509110037_0001_r_000000_0 0.11111112%reduce>复制(3中的1个为0.00MB / s)>
INFO org.apache.hadoop.mapred.TaskTracker:attempt_201509110037_0001_r_000000_0 0.11111112%reduce>复制(3中的1个为0.00MB / s)>
INFO org.apache.hadoop.mapred.TaskTracker:attempt_201509110037_0001_r_000000_0 0.11111112%reduce>复制(3中的1个为0.00MB / s)>
INFO org.apache.hadoop.mapred.TaskTracker:attempt_201509110037_0001_r_000000_0 0.11111112%reduce>复制(3中的1个为0.00MB / s)>
INFO org.apache.hadoop.mapred.TaskTracker:attempt_201509110037_0001_r_000000_0 0.11111112%reduce>复制(3中的1个为0.00MB / s)>

同样在我正在运行程序的控制台上它挂在 -

  00:39:24警告mapred.JobClient:使用GenericOptionsParser解析参数。应用程序应该实现相同的工具。 
00:39:24 INFO util.NativeCodeLoader:加载native-hadoop库
00:39:24 WARN snappy.LoadSnappy:Snappy本地库未加载
00:39:24信息mapred .FileInputFormat:要输入的总输入路径:2
00:39:24信息mapred.JobClient:正在运行的作业:job_201509110037_0001
00:39:25信息mapred.JobClient:map 0%reduce 0%
00:39:28信息mapred.JobClient:map 100%减少0%
00:39:35信息mapred.JobClient:map 100%减少11%

和我的配置文件如下所示: -

//核心站点。 xml

 < configuration> 
<属性>
< name> hadoop.tmp.dir< / name>
< value> / app / hadoop / tmp< / value>
< description>其他临时目录的基础。< / description>
< / property>

<属性>
<名称> fs.default.name< /名称>
< value> hdfs:// HadoopMaster:54310< /值>
< description>默认文件系统的名称。一个URI,其
模式和权限决定了FileSystem的实现。
uri的方案决定配置属性(fs.SEMEME.impl)命名
FileSystem实现类。 uri的权限用于
确定文件系统的主机,端口等。< / description>
< / property>
< / configuration>

// hdfs-site.xml

 < configuration> 
<属性>
< name> dfs.replication< / name>
<值> 1< /值>
< description>默认块复制。
创建文件时可以指定实际的复制次数。
如果在创建时未指定复制,则使用默认值。
< / description>
< / property>
< / configuration>

// mapred-site.xml $ b

 < configuration> 
<属性>
<名称> mapred.job.tracker< / name>
< value> HadoopMaster:54311< /值>
< description> MapReduce作业追踪器的主机和端口运行
at。如果是本地,那么作业将作为单个映射
在进程中运行并减少任务。
< / description>
< / property>
<属性>
<名称> mapred.reduce.slowstart.completed.maps< / name>
<值> 0.80< /值>
< / property>
< / configuration>

/ etc / hosts

  127.0.0.1 localhost 
127.0.1.1 M-1947

#HADOOP CLUSTER SETUP
172.50.88.54 HadoopMaster
172.50.88.60 HadoopSlave1

#以下几行适用于支持IPv6的主机
:: 1 ip6-localhost ip6-loopback
fe00 :: 0 ip6-localnet
ff00 :: 0 ip6-mcastprefix
ff02 :: 1 ip6-allnodes
ff02 :: 2 ip6-allrouters



> / etc / hostname


M-1947


//主人


HadoopMaster

//奴隶


HadoopMaster



HadoopSlave1


我一直长期苦苦挣扎,任何帮助表示赞赏。谢谢!

解决方案

虽然,同样的问题在论坛上有多个问题,但根据我的验证解决方案是群集中任何节点的主机名解析应该是正确的(此外,这个问题不取决于群集的大小)。



实际上,这是 dns-lookup 的问题,请确保进行以下更改以解决上述问题 -


  1. 尝试使用'$ hostname'在每台机器上打印主机名


  2. 为每台机器打印的主机名与主机/从机文件中为相应机器所做的条目相同。

  3. 如果不匹配,则通过更改主机名/ etc / hostname文件并重新引导系统。
  4. 例子: -



    位于 / etc / hosts 文件中(例如,在hadoop群集的主机上)


    127.0.0.1 localhost

    127.0.1.1 john-machine

    #Hadoop集群



    172.50.88.21 HadoopMaster


    $ b 172.50.88.22 HadoopSlave1



    172.50.88.23 HadoopSlave2

    blockquote>

    然后它是 - > (在主机上)应包含以下条目(用于解决上述问题)
    $ block
    Hadoop Master

    类似地验证/ etc /每个从节点的主机名文件。

    My question may sound redundant here but the solution to the earlier questions were all ad-hoc. few I have tried but no luck yet.

    Acutally, I am working on hadoop-1.2.1(on ubuntu 14), Initially I had single node set-up and there I ran the WordCount program succesfully. Then I added one more node to it according to this tutorial. It started successfully, without any errors, But now when I am running the same WordCount program it is hanging in reduce phase. I looked at task-tracker logs, they are as given below :-

    INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201509110037_0001_m_000002_0 task's state:UNASSIGNED
    INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201509110037_0001_m_000002_0 which needs 1 slots
    INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_201509110037_0001_m_000002_0 which needs 1 slots
    INFO org.apache.hadoop.mapred.JobLocalizer: Initializing user hadoopuser on this TT.
    INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201509110037_0001_m_18975496
    INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201509110037_0001_m_18975496 spawned.
    INFO org.apache.hadoop.mapred.TaskController: Writing commands to /app/hadoop/tmp/mapred/local/ttprivate/taskTracker/hadoopuser/jobcache/job_201509110037_0001/attempt_201509110037_0001_m_000002_0/taskjvm.sh
    INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201509110037_0001_m_18975496 given task: attempt_201509110037_0001_m_000002_0
    INFO org.apache.hadoop.mapred.TaskTracker: attempt_201509110037_0001_m_000002_0 0.0% hdfs://HadoopMaster:54310/input/file02:25+3
    INFO org.apache.hadoop.mapred.TaskTracker: Task attempt_201509110037_0001_m_000002_0 is done.
    INFO org.apache.hadoop.mapred.TaskTracker: reported output size for attempt_201509110037_0001_m_000002_0  was 6
    INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 2
    INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201509110037_0001_m_18975496 exited with exit code 0. Number of tasks it ran: 1
    INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201509110037_0001_r_000000_0 task's state:UNASSIGNED
    INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201509110037_0001_r_000000_0 which needs 1 slots
    INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_201509110037_0001_r_000000_0 which needs 1 slots
    INFO org.apache.hadoop.io.nativeio.NativeIO: Initialized cache for UID to User mapping with a cache timeout of 14400 seconds.
    INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName hadoopuser for UID 10 from the native implementation
    INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201509110037_0001_r_18975496
    INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201509110037_0001_r_18975496 spawned.
    INFO org.apache.hadoop.mapred.TaskController: Writing commands to /app/hadoop/tmp/mapred/local/ttprivate/taskTracker/hadoopuser/jobcache/job_201509110037_0001/attempt_201509110037_0001_r_000000_0/taskjvm.sh
    INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201509110037_0001_r_18975496 given task: attempt_201509110037_0001_r_000000_0
    INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.1.1:500, dest: 127.0.0.1:55946, bytes: 6, op: MAPRED_SHUFFLE, cliID: attempt_201509110037_0001_m_000002_0, duration: 7129894
    INFO org.apache.hadoop.mapred.TaskTracker: attempt_201509110037_0001_r_000000_0 0.11111112% reduce > copy (1 of 3 at 0.00 MB/s) > 
    INFO org.apache.hadoop.mapred.TaskTracker: attempt_201509110037_0001_r_000000_0 0.11111112% reduce > copy (1 of 3 at 0.00 MB/s) > 
    INFO org.apache.hadoop.mapred.TaskTracker: attempt_201509110037_0001_r_000000_0 0.11111112% reduce > copy (1 of 3 at 0.00 MB/s) > 
    INFO org.apache.hadoop.mapred.TaskTracker: attempt_201509110037_0001_r_000000_0 0.11111112% reduce > copy (1 of 3 at 0.00 MB/s) > 
    INFO org.apache.hadoop.mapred.TaskTracker: attempt_201509110037_0001_r_000000_0 0.11111112% reduce > copy (1 of 3 at 0.00 MB/s) > 
    INFO org.apache.hadoop.mapred.TaskTracker: attempt_201509110037_0001_r_000000_0 0.11111112% reduce > copy (1 of 3 at 0.00 MB/s) > 
    

    Also on the console where I am running the program It hangs at -

    00:39:24 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    00:39:24 INFO util.NativeCodeLoader: Loaded the native-hadoop library
    00:39:24 WARN snappy.LoadSnappy: Snappy native library not loaded
    00:39:24 INFO mapred.FileInputFormat: Total input paths to process : 2
    00:39:24 INFO mapred.JobClient: Running job: job_201509110037_0001
    00:39:25 INFO mapred.JobClient:  map 0% reduce 0%
    00:39:28 INFO mapred.JobClient:  map 100% reduce 0%
    00:39:35 INFO mapred.JobClient:  map 100% reduce 11%
    

    and my configuration files are as follows :-

    //core-site.xml

    <configuration>
    <property>
      <name>hadoop.tmp.dir</name>
      <value>/app/hadoop/tmp</value>
      <description>A base for other temporary directories.</description>
    </property>
    
    <property>
      <name>fs.default.name</name>
      <value>hdfs://HadoopMaster:54310</value>
      <description>The name of the default file system.  A URI whose
      scheme and authority determine the FileSystem implementation.  The
      uri's scheme determines the config property (fs.SCHEME.impl) naming
      the FileSystem implementation class.  The uri's authority is used to
      determine the host, port, etc. for a filesystem.</description>
    </property>
    </configuration>
    

    //hdfs-site.xml

    <configuration>
    <property>
      <name>dfs.replication</name>
      <value>1</value>
      <description>Default block replication.
      The actual number of replications can be specified when the file is created.
      The default is used if replication is not specified in create time.
      </description>
    </property>   
    </configuration>
    

    //mapred-site.xml

    <configuration>
    <property>
      <name>mapred.job.tracker</name>
      <value>HadoopMaster:54311</value>
      <description>The host and port that the MapReduce job tracker runs
      at.  If "local", then jobs are run in-process as a single map
      and reduce task.
      </description>
    </property>
    <property>
    <name>mapred.reduce.slowstart.completed.maps</name>
      <value>0.80</value>
    </property>    
    </configuration>
    

    /etc/hosts

    127.0.0.1 localhost
    127.0.1.1 M-1947
    
    #HADOOP CLUSTER SETUP
    172.50.88.54 HadoopMaster
    172.50.88.60 HadoopSlave1
    
    # The following lines are desirable for IPv6 capable hosts
    ::1     ip6-localhost ip6-loopback
    fe00::0 ip6-localnet
    ff00::0 ip6-mcastprefix
    ff02::1 ip6-allnodes
    ff02::2 ip6-allrouters
    

    /etc/hostname

    M-1947

    //masters

    HadoopMaster

    //slaves

    HadoopMaster

    HadoopSlave1

    I have been struggling with it for long, any help is appreciated. Thanks !

    解决方案

    Got it fixed.. although, the same issue has multiple questions on the forums but the verified solution according to me is that hostname resolution for the any node in the cluster should be correct (moreover this issue doesnot depend upon the size of cluster).

    Actually it is the issue with dns-lookup, ensure one make the below changes to resolve the above issue -

    1. try printing hostname on each machine using '$ hostname'

    2. check that the hostname printed for each machine is same as the entry made in master/slaves file for respective machine.

    3. If it doesn't matches then rename the host by making changes in the /etc/hostname file and reboot the system.

    Example :-

    in /etc/hosts file (let's say on Master machine of hadoop cluster)

    127.0.0.1 localhost

    127.0.1.1 john-machine

    #Hadoop cluster

    172.50.88.21 HadoopMaster

    172.50.88.22 HadoopSlave1

    172.50.88.23 HadoopSlave2

    then it's -> /etc/hostname file (on master machine) should contain the following entry (for the above issue to be resolved)

    HadoopMaster

    similarly verify the /etc/hostname files of the each slave node.

    这篇关于Hadoop 1.2.1 - 多节点群集 - Wordcount程序中的Reducer阶段挂起?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆