纱线hadoop 2.4.0:信息消息:ipc.Client重试连接到服务器 [英] yarn hadoop 2.4.0: info message: ipc.Client Retrying connect to server
问题描述
我已经搜索了两天寻求解决方案。但没有任何工作。
i've searched for two days for a solution. but nothing worked.
首先,我是整个hadoop / yarn / hdfs主题的新手,想要配置一个小群集。
First, i'm new to the whole hadoop/yarn/hdfs topic and want to configure a small cluster.
上面的消息不会显示每次我从mapreduce-examples.jar运行一个例子
有时teragen的作品,有时不是。
在某些情况下,整个工作失败,在另一些情况下,工作成功完成。
the message above doesn't show up everytime i run an example from the mapreduce-examples.jar sometimes teragen works, sometimes not. in some cases the whole job failed, in others the job finishes successfully. sometimes the job failes, without printing the message above.
14/06/08 15:42:46 INFO ipc.Client: Retrying connect to server: FQDN-HOSTNAME/XXX.XX.XX.XXX:53022. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
此消息打印30次。每次作业开始时,端口(代码示例:53022)也会随之改变。
如果工作顺利完成,这是打印
this message is print 30 times. also the port (in code example: 53022) changes with every time a job is started. if job finished succesfuly, this is print
14/06/08 15:34:20 INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
14/06/08 15:34:20 INFO mapreduce.Job: Job job_1402234146062_0002 running in uber mode : false
14/06/08 15:34:20 INFO mapreduce.Job: map 100% reduce 100%
14/06/08 15:34:20 INFO mapreduce.Job: Job job_1402234146062_0002 completed successfully
如果失败,则显示。
INFO mapreduce.Job: Job job_1402234146062_0005 failed with state FAILED due to: Task failed task_1402234146062_0005_m_000002
Job failed as tasks failed. failedMaps:1 failedReduces:0
在这种情况下,某些任务失败。但在nodemanager,datanode,resourcemanager等日志文件中,没有理由或消息可以找到。
in this case, some tasks failed. but in log files of nodemanager, datanode, resourcemanager, ... is no reason or message to find.
INFO mapreduce.Job: Task Id : attempt_1402234146062_0006_m_000002_1, Status : FAILED
有关我的配置的其他信息:
使用OS :centOS 6.5
Java版本:OpenJDK运行时环境(rhel-2.4.7.1.el6_5-x86_64 u55-b13)
OpenJDK 64位服务器虚拟机(内置24.51-b03,混合模式)
Additional Information about my Configuration: used OS: centOS 6.5 Java Version: OpenJDK Runtime Environment (rhel-2.4.7.1.el6_5-x86_64 u55-b13) OpenJDK 64-Bit Server VM (build 24.51-b03, mixed mode)
yarn-site.xml
yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.address</name>
<value>FQDN-HOSTNAME:8050</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.localizer.address</name>
<value>FQDN-HOSTNAME:8040</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>FQDN-HOSTNAME:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>FQDN-HOSTNAME:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>FQDN-HOSTNAME:8032</value>
</property>
</configuration>
hdfs-site.xml
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions </name>
<value>false </value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///var/data/hadoop/hdfs/nn</value>
</property>
<property>
<name>fs.checkpoint.dir</name>
<value>file:///var/data/hadoop/hdfs/snn</value>
</property>
<property>
<name>fs.checkpoint.edits.dir</name>
<value>file:///var/data/hadoop/hdfs/snn</value>
<name>fs.checkpoint.edits.dir</name>
<value>file:///var/data/hadoop/hdfs/snn</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///var/data/hadoop/hdfs/dn</value>
</property>
</configuration>
mapred-site.xml
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.cluster.temp.dir</name>
<value>/mapred/tempDir</value>
</property>
<property>
<name>mapreduce.cluster.local.dir</name>
<value>/mapred/localDir</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>FQDN-HOSTNAME:10020</value>
</property>
</configuration>
我希望有人能帮助我。 :)
谢谢,
Norman
I hope somebody could help me. :) Thank you, Norman
推荐答案
工作有时会顺利完成,因为当您有一个减速器并且将偶然减少的任务发送给工作节点管理器,然后它成为成功的工作。
The job finishes sometimes successfully because when you have one reducer and that reduce task by chance is sent to a working
node manager then it becomes successful job.
您必须确保 FQDN-HOSTNAME
在 slaves
文件中的写法完全相同。如果我没有记错,我的解决方案是我删除了 / etc / hosts
中的主机名映射条目,即将其注释为:
You have to make sure that FQDN-HOSTNAME
is written exactly the same way in the slaves
file. If I remember correctly, my solution was that I removed the entry for the hostname mapping in /etc/hosts
, that is commenting it out like this:
#127.0.0.1 FQDN-HOSTNAME
这篇关于纱线hadoop 2.4.0:信息消息:ipc.Client重试连接到服务器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!