在hadoop多节点集群上启动HDFS守护程序时出错 [英] Error on starting HDFS daemons on hadoop Multinode cluster

查看:322
本文介绍了在hadoop多节点集群上启动HDFS守护程序时出错的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Hadoop多节点设置时发出问题.我启动Master上的hdfs恶魔(bin/start-dfs.sh)

Issue While Hadoop multi-node set-up .As soon as i start My hdfs demon on Master (bin/start-dfs.sh)

我确实获得了Master的以下日志

i did got below logs on Master

starting namenode, logging to /home/hduser/hadoop/libexec/../logs/hadoop-hduser-namenode-localhost.localdomain.out
slave: Warning: $HADOOP_HOME is deprecated.
slave:
slave: starting datanode, logging to /home/hduser/hadoop/libexec/../logs/hadoop-hduser-datanode-localhost.localdomain.out
master: Warning: $HADOOP_HOME is deprecated.
master:
master: starting datanode, logging to /home/hduser/hadoop/libexec/../logs/hadoop-hduser-datanode-localhost.localdomain.out
master: Warning: $HADOOP_HOME is deprecated.
master:
master: starting secondarynamenode, logging to /home/hduser/hadoop/libexec/../logs/hadoop-hduser-secondarynamenode-localhost.localdomain.out

我确实在奴隶@上获得了以下日志

i did got below logs on slave @

hadoop-hduser-datanode-localhost.localdomain.log文件

hadoop-hduser-datanode-localhost.localdomain.log file

有人可以建议我,设置有什么问题.

can some advise me , whats the wrong with set-up .

2013-07-24 12:10:59,373 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.0.1:54310. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-24 12:11:00,374 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.0.1:54310. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-24 12:11:00,377 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Call to master/192.168.0.1:54310 failed on local exception: java.net.NoRouteToHostException: No route to host
        at org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
        at org.apache.hadoop.ipc.Client.call(Client.java:1112)

推荐答案

确保您的NameNode运行正常.如果它已经在运行,请检查连接中是否有任何问题.您的DataNode无法与NameNode对话.确保已在从站的/etc/hosts 文件中添加了计算机的IP和主机名.尝试telnet到192.168.0.1:54310,看看是否可以连接.

Make sure your NameNode is running fine. If it is already running see if there is any problem in the connection. Your DataNode is not able to talk to the NameNode. Make sure you have added the IP and hostname of the machine in the /etc/hosts file of your slave. Try telnet to 192.168.0.1:54310 and see whether you are able to connect or not.

向我们显示NN日志会很有帮助.

Showing us the NN logs would be helpful.

看看Wiki对于这个问题有什么要说的: 当网络上的一台计算机不知道如何将TCP数据包发送到指定的计算机时,您将收到一个TCP No Route to Host的错误-通常包装在Java IOException中.

See what the wiki has to say about this problem : You get a TCP No Route To Host Error -often wrapped in a Java IOException, when one machine on the network does not know how to send TCP packets to the machine specified.

一些可能的原因(不是排他性列表):

Some possible causes (not an exclusive list):

  • 配置文件中远程计算机的主机名错误.
  • 客户端的主机表//etc/hosts的目标主机的IP地址无效.
  • DNS服务器的主机表的目标主机IP地址无效.
  • 客户端的路由表(在Linux中为iptables)是错误的.
  • DHCP服务器正在发布错误的路由信息​​.
  • 客户端和服务器位于不同的子网中,并且未设置为可以相互通信.这可能是偶然的,或者是故意锁定Hadoop集群.
  • 机器正在尝试使用IPv6进行通信. Hadoop当前不支持IPv6
  • 主机的IP地址已更改,但是寿命长的JVM正在缓存旧值.这是JVM的已知问题(搜索"java否定DNS缓存"以获取详细信息和解决方案).

快速解决方案:重新启动JVM.

The quick solution: restart the JVMs.

所有这些都是网络配置/路由器问题.由于它是您的网络,因此只有您可以找出并找出问题所在.

These are all network configuration/router issues. As it is your network, only you can find out and track down the problem.

这篇关于在hadoop多节点集群上启动HDFS守护程序时出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆