Hadoop Datanodes找不到NameNode [英] Hadoop Datanodes cannot find NameNode

查看:301
本文介绍了Hadoop Datanodes找不到NameNode的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在VirtualBox中设置了一个分布式Hadoop环境:4个虚拟Ubuntu 11.10安装,一个充当主节点,另外三个充当奴隶。我遵循本教程以启动并运行单节点版本,然后转换为完全分布式版本。我在运行11.04时工作得很好;然而,当我升级到11.10时,它打破了。现在我所有的奴隶日志显示以下错误消息,反复发生广告:

  INFO org.apache.hadoop.ipc.Client :重试连接到服务器:master / 192.168.1.10:54310。已经尝试0次(s)。 
INFO org.apache.hadoop.ipc.Client:重试连接到服务器:master / 192.168.1.10:54310。已经尝试过1次。
INFO org.apache.hadoop.ipc.Client:重试连接到服务器:master / 192.168.1.10:54310。已经尝试过2次。

等等。我在Internet上发现了此错误消息的其他实例(以及 StackOverflow ),但没有任何解决方案已经工作(尝试将core-site.xml和mapred-site.xml条目更改为IP地址而不是主机名;四重检查 / etc / hosts 在所有从设备和主设备上;主设备可以在所有从设备上无密码SSH)。我甚至尝试将每个slave重新转换回单节点设置,并且在这种情况下它们都可以正常工作(在这种情况下,master作为Datanode和Namenode总是可以正常工作)。



我发现的唯一迹象是,从任何一个奴隶来看,当我尝试一个 telnet 192.168.1.10 54310 ,我得到连接被拒绝,这表明有一些规则阻止访问(当我升级到11.10时它必须已经生效)。

我的 /etc/hosts.allow 并没有改变,但是。我试过规则 ALL:192.168.1。,但它没有改变行为。



哦是的,和 netstat 上的主人清楚地显示tcp端口54310和54311正在监听。



任何人都有任何建议让奴隶Datanodes识别Namenode?



编辑#1 :与nmap进行一些讨论在这篇文章中),我认为这个问题出现在我的 / etc / hosts 文件中。这是主虚拟机列出的内容:

  127.0.0.1 localhost 
127.0.1.1 master
192.168.1.10 master
192.168.1.11 slave1
192.168.1.12 slave2
192.168.1.13 slave3

对于每个从VM:

  127.0.0.1 localhost 
127.0.1.1 slaveX
192.168.1.10 master
192.168.1.1X slaveX

不幸的是,我不是确定我改变了什么,但NameNode现在总是死去,除了尝试绑定一个已经在使用的端口(127.0.1.1:54310)。我很清楚主机名和IP地址有问题,但我真的不确定它是什么。想法?

解决方案

我找到了!通过注释 / etc / hosts 文件(使用 127.0.1.1 条目)的第二行, netstat 显示NameNode端口绑定到 192.168.1.10 地址而不是本地地址,并且从属虚拟机发现它。 Ahhhhhhhh。谜团已揭开!感谢大家的帮助。


I've set up a distributed Hadoop environment within VirtualBox: 4 virtual Ubuntu 11.10 installations, one acting as the master node, the other three as slaves. I followed this tutorial to get the single-node version up and running and then converted to the fully-distributed version. It was working just fine when I was running 11.04; however, when I upgraded to 11.10, it broke. Now all my slaves' logs show the following error message, repeated ad nauseum:

INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.1.10:54310. Already tried 0 time(s).
INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.1.10:54310. Already tried 1 time(s).
INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.1.10:54310. Already tried 2 time(s).

And so on. I've found other instances of this error message on the Internet (and StackOverflow) but none of the solutions have worked (tried changing the core-site.xml and mapred-site.xml entries to be the IP address rather than hostname; quadruple-checked /etc/hosts on all slaves and master; master can SSH password-less into all slaves). I even tried reverting each slave back to a single-node setup, and they would all work fine in this case (on that note, the master always works fine as both a Datanode and the Namenode).

The only symptom I've found that would seem to give a lead is that from any of the slaves, when I attempt a telnet 192.168.1.10 54310, I get Connection refused, suggesting there is some rule blocking access (which must have gone into effect when I upgraded to 11.10).

My /etc/hosts.allow has not changed, however. I tried the rule ALL: 192.168.1., but it did not change the behavior.

Oh yes, and netstat on the master clearly shows tcp ports 54310 and 54311 are listening.

Anyone have any suggestions to get the slave Datanodes to recognize the Namenode?

EDIT #1: In doing some poking around with nmap (see comments on this post), I'm thinking the issue is in my /etc/hosts files. This is what is listed for the master VM:

127.0.0.1    localhost
127.0.1.1    master
192.168.1.10 master
192.168.1.11 slave1
192.168.1.12 slave2
192.168.1.13 slave3

For each slave VM:

127.0.0.1    localhost
127.0.1.1    slaveX
192.168.1.10 master
192.168.1.1X slaveX

Unfortunately, I'm not sure what I changed, but the NameNode is now always dying with the exception of trying to bind a port "that's already in use" (127.0.1.1:54310). I'm clearly doing something wrong with the hostnames and IP addresses, but I'm really not sure what it is. Thoughts?

解决方案

I found it! By commenting out the second line of the /etc/hosts file (the one with the 127.0.1.1 entry), netstat shows the NameNode ports binding to the 192.168.1.10 address instead of the local one, and the slave VMs found it. Ahhhhhhhh. Mystery solved! Thanks for everyone's help.

这篇关于Hadoop Datanodes找不到NameNode的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆