Hadoop Datanodes 找不到 NameNode [英] Hadoop Datanodes cannot find NameNode

查看:76
本文介绍了Hadoop Datanodes 找不到 NameNode的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 VirtualBox 中设置了一个分布式 Hadoop 环境:4 个虚拟 Ubuntu 11.10 安装,一个作为主节点,其他三个作为从节点.我跟着本教程 启动并运行单节点版本,然后转换为完全分布式版本.当我运行 11.04 时它工作得很好;但是,当我升级到 11.10 时,它坏了.现在我所有的奴隶的日志都显示以下错误信息,重复令人作呕:

I've set up a distributed Hadoop environment within VirtualBox: 4 virtual Ubuntu 11.10 installations, one acting as the master node, the other three as slaves. I followed this tutorial to get the single-node version up and running and then converted to the fully-distributed version. It was working just fine when I was running 11.04; however, when I upgraded to 11.10, it broke. Now all my slaves' logs show the following error message, repeated ad nauseum:

INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.1.10:54310. Already tried 0 time(s).
INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.1.10:54310. Already tried 1 time(s).
INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.1.10:54310. Already tried 2 time(s).

等等.我在 Internet 上发现了此错误消息的其他实例(以及 StackOverflow)但没有一个解决方案有效(尝试将 core-site.xml 和 mapred-site.xml 条目更改为 IP 地址而不是主机名;四重检查 /etc/hosts 在所有从站和主站上;主站可以通过 SSH 无密码进入所有从站).我什至尝试将每个从节点恢复为单节点设置,在这种情况下它们都可以正常工作(在这一点上,主节点作为 Datanode 和 Namenode 始终可以正常工作).

And so on. I've found other instances of this error message on the Internet (and StackOverflow) but none of the solutions have worked (tried changing the core-site.xml and mapred-site.xml entries to be the IP address rather than hostname; quadruple-checked /etc/hosts on all slaves and master; master can SSH password-less into all slaves). I even tried reverting each slave back to a single-node setup, and they would all work fine in this case (on that note, the master always works fine as both a Datanode and the Namenode).

我发现的唯一一个似乎可以提供线索的症状是,当我尝试使用 telnet 192.168.1.10 54310 时,从任何从属设备上,我得到 连接被拒绝,提示有一些规则阻止访问(应该是我升级到 11.10 时生效的).

The only symptom I've found that would seem to give a lead is that from any of the slaves, when I attempt a telnet 192.168.1.10 54310, I get Connection refused, suggesting there is some rule blocking access (which must have gone into effect when I upgraded to 11.10).

不过,我的 /etc/hosts.allow 没有改变.我尝试了规则 ALL: 192.168.1.,但它没有改变行为.

My /etc/hosts.allow has not changed, however. I tried the rule ALL: 192.168.1., but it did not change the behavior.

哦,是的,master 上的 netstat 清楚地显示 tcp 端口 54310 和 54311 正在侦听.

Oh yes, and netstat on the master clearly shows tcp ports 54310 and 54311 are listening.

有没有人有什么建议可以让slave Datanodes识别Namenode?

EDIT #1:在使用 nmap 进行一些探索时(请参阅这篇文章的评论),我认为问题出在我的 /etc/hosts 文件中.这是为主 VM 列出的内容:

EDIT #1: In doing some poking around with nmap (see comments on this post), I'm thinking the issue is in my /etc/hosts files. This is what is listed for the master VM:

127.0.0.1    localhost
127.0.1.1    master
192.168.1.10 master
192.168.1.11 slave1
192.168.1.12 slave2
192.168.1.13 slave3

对于每个从属虚拟机:

127.0.0.1    localhost
127.0.1.1    slaveX
192.168.1.10 master
192.168.1.1X slaveX

不幸的是,我不确定我更改了什么,但是除了尝试绑定已在使用中"的端口 (127.0.1.1:54310) 之外,NameNode 现在总是死掉.我显然在主机名和 IP 地址上做错了什么,但我真的不确定它是什么.想法?

Unfortunately, I'm not sure what I changed, but the NameNode is now always dying with the exception of trying to bind a port "that's already in use" (127.0.1.1:54310). I'm clearly doing something wrong with the hostnames and IP addresses, but I'm really not sure what it is. Thoughts?

推荐答案

我找到了!通过注释掉 /etc/hosts 文件的第二行(带有 127.0.1.1 条目的那一行),netstat 显示了 NameNode 端口绑定到 192.168.1.10 地址而不是本地地址,并且从虚拟机找到了它.啊啊啊啊.谜团已揭开!感谢大家的帮助.

I found it! By commenting out the second line of the /etc/hosts file (the one with the 127.0.1.1 entry), netstat shows the NameNode ports binding to the 192.168.1.10 address instead of the local one, and the slave VMs found it. Ahhhhhhhh. Mystery solved! Thanks for everyone's help.

这篇关于Hadoop Datanodes 找不到 NameNode的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆