Hadoop HDFS - 无法连接到主站上的端口 [英] Hadoop HDFS - Cannot connect to port on master

查看:178
本文介绍了Hadoop HDFS - 无法连接到主站上的端口的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我建立了一个小型Hadoop集群进行测试。 NameNode(1台机器),SecondaryNameNode(1)和所有DataNode(3)的安装程序相当顺利。这些机器被命名为master,secondary和data01,data02和data03。所有的DNS都已正确设置,并且从主/辅助设备向所有机器配置了无密码SSH并返回。

我使用 bin / hadoop namenode -format 格式化群集,然后使用<$ c $启动所有服务C>仓/ start-all.sh 。所有节点上的所有进程都被检查启动并运行 jps 。我的基本配置文件如下所示:

 <! -  conf / core-site.xml  - > 
<配置>
<属性>
<名称> fs.default.name< /名称>
<! -
在主机上它是本地主机
对其他主机的DNS
(ping可以在任何地方使用)
- >
< value> hdfs:// localhost:9000< / value>
< / property>
<属性>
< name> hadoop.tmp.dir< / name>
<! - 我为root FS选取/ hdfs - >
< value> / hdfs / tmp< / value>
< / property>
< / configuration>

<! - conf / hdfs-site.xml - >
<配置>
<属性>
<名称> dfs.name.dir< /名称>
< value> / hdfs / name< / value>
< / property>
<属性>
<名称> dfs.data.dir< /名称>
< value> / hdfs / data< / value>
< / property>
<属性>
< name> dfs.replication< / name>
<值> 3< /值>
< / property>
< / configuration>

#conf / master
secondary

#conf / slaves
data01
data02
data03

我只是想让HDFS正常运行。



我创建了一个dir来测试 hadoop fs -mkdir testing ,然后试着用 hadoop fs -copyFromLocal / tmp / *复制一些文件.txt测试。这是当hadoop崩溃,给我多少这个:

  WARN hdfs.DFSClient:DataStreamer异常:org.apache.hadoop .ipc.RemoteException:java.io.IOException:文件/user/hd/testing/wordcount1.txt只能复制到0个节点,而不是1 
在...(诸如此类)

WARN hdfs.DFSClient:错误恢复block null bad datanode [0]节点== null
at ...

WARN hdfs.DFSClient:无法获取块位置。源文件/user/hd/testing/wordcount1.txt - 正在中止...
在...

错误hdfs.DFSClient:异常关闭文件/ user / hd / testing / wordcount1.txt:org.apache.hadoop.ipc.RemoteException:java.io.IOException:文件/user/hd/testing/wordcount1.txt只能复制到0个节点,而不是1
at ...

等等。当我尝试从DataNode机器运行 hadoop fs -lsr。时,出现类似问题,只能得到以下结果:

  2002年12月1日10:02:11信息ipc.Client:重试connt到服务器主/ 192.162.10.10:9000。已经尝试0次(s)。 
12/01/02 10:02:12信息ipc.Client:重试connt到服务器主/ 192.162.10.10:9000。已经尝试过1次。
12/01/02 10:02:13信息ipc.Client:重试connt到服务器主/ 192.162.10.10:9000。已经尝试过2次。
...

我说它是相似的,因为我怀疑这是一个端口可用性问题。运行 telnet master 9000 显示该端口已关闭。我在某处读过这可能是IPv6冲突问题,因此在conf / hadoop-env.sh中定义了以下内容:

  export HADOOP_OPTS = -Djava.net.preferIPv4Stack = true 

但是这并没有做到这一点。在主服务器上运行 netstat 会显示如下所示:

  Proto Recv- Q发送Q本地地址外部地址状态
tcp 0 0 localhost:9000 localhost:56387 ESTABLISHED
tcp 0 0 localhost:56386 localhost:9000 TIME_WAIT
tcp 0 0 localhost:56387 localhost:9000建立
tcp 0 0 localhost:56384 localhost:9000 TIME_WAIT
tcp 0 0 localhost:56385 localhost:9000 TIME_WAIT
tcp 0 0 localhost:56383 localhost:9000 TIME_WAIT

此时我很确定问题出在端口(9000)上,但我不确定我错过了什么就像配置一样。有任何想法吗?

update



我发现硬编码DNS名称为 / etc /主机不仅可以帮助解决这个问题,还可以加快连接速度。缺点是您必须在集群中的所有机器上执行此操作,并在添加新节点时再次执行此操作。或者你可以设置一个DNS服务器,我没有。

下面是我的集群中的一个节点的示例(节点名为 hadoop01 hadoop02 等,其中主控和辅助控制为01和02)。节点,大部分是由操作系统生成的:

 #这是带有dns hadoop01 
的机器的示例:: 1 localhost ip6-localhost ip6-loopback
fe00 :: 0 ip6-localnet
ff00 :: 0 ip6-mcastrprefix
ff02 :: 1 ip6-allnodes
ff02 :: 2 ip6-allroutes

#---节点的启动列表
192.168.10.101 hadoop01
192.168.10.102 hadoop02
192.168.10.103 hadoop03
192.168。 10.104 hadoop04
192.168.10.105 hadoop05
192.168.10.106 hadoop06
192.168.10.107 hadoop07
192.168.10.108 hadoop08
192.168.10.109 hadoop09
192.168.10.110 hadoop10
#...等等

#---结束节点列表

#自动生成的主机名。请不要删除此评论。
127.0.0.1 hadoop01 localhost localhost.localdomain

希望这有助于您。

解决方案

当存在远程时,将hdfs:// localhost:9000中的localhost替换为NameNode中fs.default.name属性的ip-address或hostname节点连接到NameNode。

lockquote

所有节点上的所有进程都被检查为启动并运行 jps

code>


日志文件中可能存在一些错误。 jps确保进程正在运行。


I've set up a small Hadoop cluster for testing. Setup went fairly well with the NameNode (1 machine), SecondaryNameNode (1) and all DataNodes (3). The machines are named "master", "secondary" and "data01", "data02" and "data03". All DNS are properly set up, and passwordless SSH was configured from master/secondary to all machines and back.

I formatted the cluster with bin/hadoop namenode -format, and then started all services using bin/start-all.sh. All processes on all nodes were checked to be up and running with jps. My basic configuration files look something like this:

<!-- conf/core-site.xml -->
<configuration>
  <property>
    <name>fs.default.name</name>
    <!-- 
      on the master it's localhost
      on the others it's the master's DNS
      (ping works from everywhere)
    -->
    <value>hdfs://localhost:9000</value>
  </property>
  <property>
    <name>hadoop.tmp.dir</name>
    <!-- I picked /hdfs for the root FS -->
    <value>/hdfs/tmp</value>
  </property>
</configuration>

<!-- conf/hdfs-site.xml -->
<configuration>
  <property>
    <name>dfs.name.dir</name>
    <value>/hdfs/name</value>
  </property>
  <property>
    <name>dfs.data.dir</name>
    <value>/hdfs/data</value>
  </property>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
</configuration>

# conf/masters
secondary

# conf/slaves
data01
data02
data03

I'm just trying to get HDFS running properly now.

I've created a dir for testing hadoop fs -mkdir testing, then tried to copy some files into it with hadoop fs -copyFromLocal /tmp/*.txt testing. This is when hadoop crashes, giving me more or less this:

WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/hd/testing/wordcount1.txt could only be replicated to 0 nodes, instead of 1
  at ... (such and such)

WARN hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null
  at ...

WARN hdfs.DFSClient: Could not get block locations. Source file "/user/hd/testing/wordcount1.txt" - Aborting...
  at ...

ERROR hdfs.DFSClient: Exception closing file /user/hd/testing/wordcount1.txt: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/hd/testing/wordcount1.txt could only be replicated to 0 nodes, instead of 1
  at ...

And so on. A similar issue occurs when I try to run hadoop fs -lsr . from a DataNode machine, only to get the following:

12/01/02 10:02:11 INFO ipc.Client: Retrying connt to server master/192.162.10.10:9000. Already tried 0 time(s).
12/01/02 10:02:12 INFO ipc.Client: Retrying connt to server master/192.162.10.10:9000. Already tried 1 time(s).
12/01/02 10:02:13 INFO ipc.Client: Retrying connt to server master/192.162.10.10:9000. Already tried 2 time(s).
...

I'm saying it's similar, because I suspect this is a port availability issue. Running telnet master 9000 reveals that the port is closed. I've read somewhere that this might be an IPv6 clash issue, and thus defined the following in conf/hadoop-env.sh:

export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true

But that didn't do the trick. Running netstat on the master reveals something like this:

Proto Recv-Q Send-Q  Local Address       Foreign Address      State
tcp        0      0  localhost:9000      localhost:56387      ESTABLISHED
tcp        0      0  localhost:56386     localhost:9000       TIME_WAIT
tcp        0      0  localhost:56387     localhost:9000       ESTABLISHED
tcp        0      0  localhost:56384     localhost:9000       TIME_WAIT
tcp        0      0  localhost:56385     localhost:9000       TIME_WAIT
tcp        0      0  localhost:56383     localhost:9000       TIME_WAIT

At this point I'm pretty sure the problem is with the port (9000), but I'm not sure what I missed as far as configuration goes. Any ideas? Thanks.

update

I found that hard coding DNS names into /etc/hosts not only help resolve this, but also speeds up the connections. The downside is that you have to do this on all the machines in the cluster, and again when you add new nodes. Or you can just set up a DNS server, which I didn't.

Here's a sample of my one node in my cluster (nodes are named hadoop01, hadoop02, etc, with the master and secondary being 01 and 02). Node that most of it are generated by the OS:

# this is a sample for a machine with dns hadoop01
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastrprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allroutes

# --- Start list of nodes
192.168.10.101 hadoop01
192.168.10.102 hadoop02
192.168.10.103 hadoop03
192.168.10.104 hadoop04
192.168.10.105 hadoop05
192.168.10.106 hadoop06
192.168.10.107 hadoop07
192.168.10.108 hadoop08
192.168.10.109 hadoop09
192.168.10.110 hadoop10
# ... and so on

# --- End list of nodes

# Auto-generated hostname. Please do not remove this comment.
127.0.0.1 hadoop01 localhost localhost.localdomain

Hope this helps.

解决方案

Replace localhost in hdfs://localhost:9000 with ip-address or hostname for the fs.default.name property in NameNode when there are remote nodes connecting to the NameNode.

All processes on all nodes were checked to be up and running with jps

There might be some errors in the log files. jps makes sure that the process is running.

这篇关于Hadoop HDFS - 无法连接到主站上的端口的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆