Hadoop:File ...只能复制到0个节点,而不是1个 [英] Hadoop: File ... could only be replicated to 0 nodes, instead of 1

查看:178
本文介绍了Hadoop:File ...只能复制到0个节点,而不是1个的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试在8节点IB(OFED-1.5.3-4.0.42)集群上部署Hadoop-RDMA,并且遇到以下问题(又名File ...只能复制到0个节点,而不是1个):

 
frolo @ A11:〜/ hadoop-rdma-0.9.8> ./bin/hadoop dfs -copyFromLocal ../pg132 .txt /user/frolo/input/pg132.txt
警告:$ HADOOP_HOME已弃用。

14/02/05 19:06:30 WARN hdfs.DFSClient:DataStreamer异常:java.lang.reflect.UndeclaredThrowableException $ b $ com at com.un.proxy。$ Proxy1.addBlock(Unknown来源)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke( DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(Unknown Source)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(Unknown Source)
at com.sun.proxy。$ Proxy1.addBlock(Unknown Source)
at org.apache.hadoop .hdfs.From.Code(Unknown Source)
at org.apache.hadoop.hdfs.From.F(Unknown Source)
at org.apache.hadoop.hdfs.From.F(Unknown Source)
在org.apache.hadoop.hdfs.The.run(未知来源)
引起:org.apache.h adoop.ipc.RemoteException:java.io.IOException:文件/user/frolo/input/pg132.txt只能复制到0个节点,而不是1
,位于org.apache.hadoop.hdfs.server.namenode .FSNamesystem.getAdditionalBlock(Unknown Source)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.ipc .RPC $ Server.call(未知源)
在org.apache.hadoop.ipc.rdma.madness.Code(未知源)
在org.apache.hadoop.ipc.rdma.madness.run (未知源)
在java.security.AccessController.doPrivileged(本地方法)
在javax.security.auth.Subject.doAs(Subject.java:415)
在org.apache。 hadoop.security.UserGroupInformation.doAs(Unknown Source)
at org.apache.hadoop.ipc.rdma .be.run(未知来源)
在org.apache.hadoop.ipc.rdma.RDMAClient.Code(未知来源)
在org.apache.hadoop.ipc.rdma.RDMAClient.call(未知来源)
at org.apache.hadoop.ipc.Tempest.invoke(Unknown Source)
... 12 more`

14/02/05 19:06:30 WARN hdfs.DFSClient:错误恢复为null坏的datanode [0]节点== null
14/02/05 19:06:30 WARN hdfs.DFSClient:无法获取块位置。源文件/user/frolo/input/pg132.txt - 正在中止...
14/02/05 19:06:30信息hdfs.DFSClient:isClosed中的异常

当我开始从本地文件系统复制到HDFS时,似乎没有数据传输到DataNodes。我测试了DataNodes的可用性:

 
frolo @ A11:〜/ hadoop-rdma-0.9.8> ./bin/hadoop dfsadmin -report
警告:$ HADOOP_HOME已弃用。

已配置容量:0(0 KB)
当前容量:0(0 KB)
剩余DFS:0(0 KB)
使用的DFS:0(0 KB)
使用的DFS%: %
在复制块中:0
具有损坏的复制块的块:0
缺失块:0`

- -----------------------------------------------
数据节点可用:0(共4个,4个死)`

`名称:10.10.1.13:50010
停用状态:正常
配置容量:0(0 KB)
使用的DFS:0(0 KB)
使用的非DFS:0(0 KB)
剩余的DFS:0(0 KB)
已使用的DFS%:100%
DFS余下%:0%
最后一次接触:Wed Feb 05 19:02:54 MSK 2014


名称:10.10.1.14:50010
退役状态:正常
配置的容量:0(0 KB)
使用的DFS:0(0 KB)
使用的非DFS:0(0 KB)
剩余的DFS:0(0 KB)
已使用DFS%:100%
剩余DFS%:0%
最后一次联系:Wed Feb 05 19:02:54 MSK 2014


姓名: 10.10.1.16:50010
停用状态:正常
已配置的上限0(0 KB)
使用的DFS:0(0 KB)
使用的非DFS:0(0 KB)
剩余的DFS:0(0 KB)
已使用的DFS %:100%
DFS余下%:0%
最后一次联系:Wed Feb 05 19:02:54 MSK 2014


名称:10.10.1.11:50010
停用状态:正常
配置容量:0(0 KB)
已使用DFS:0(0 KB)
使用非DFS:0(0 KB)
DFS剩余:0(0 KB)
已使用DFS%:100%
剩余DFS%:0%
最后一次联系:Wed Feb 05 19:02:55 MSK 2014

 

并试图在已成功的HDFS文件系统中执行mkdir。重新启动Hadoop守护进程并没有产生任何积极影响。



请你帮我解决这个问题吗?谢谢。



Best,
Alex

解决方案

我发现我的问题。这个问题与已经设置为NFS分区的hadoop.tmp.dir的配置有关。默认情况下,它被配置为/ tmp,即本地fs。从core-site.xml中删除hadoop.tmp.dir后,问题已解决。


I am trying to deploy Hadoop-RDMA on 8 node IB (OFED-1.5.3-4.0.42) cluster and got into the following problem (a.k.a File ... could only be replicated to 0 nodes, instead of 1):

frolo@A11:~/hadoop-rdma-0.9.8> ./bin/hadoop dfs -copyFromLocal ../pg132.txt /user/frolo/input/pg132.txt
Warning: $HADOOP_HOME is deprecated.

14/02/05 19:06:30 WARN hdfs.DFSClient: DataStreamer Exception: java.lang.reflect.UndeclaredThrowableException
    at com.sun.proxy.$Proxy1.addBlock(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(Unknown Source)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(Unknown Source)
    at com.sun.proxy.$Proxy1.addBlock(Unknown Source)
    at org.apache.hadoop.hdfs.From.Code(Unknown Source)
    at org.apache.hadoop.hdfs.From.F(Unknown Source)
    at org.apache.hadoop.hdfs.From.F(Unknown Source)
    at org.apache.hadoop.hdfs.The.run(Unknown Source)
Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/frolo/input/pg132.txt could only be replicated to 0 nodes, instead of 1
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(Unknown Source)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(Unknown Source)
    at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.ipc.RPC$Server.call(Unknown Source)
    at org.apache.hadoop.ipc.rdma.madness.Code(Unknown Source)
    at org.apache.hadoop.ipc.rdma.madness.run(Unknown Source)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(Unknown Source)
    at org.apache.hadoop.ipc.rdma.be.run(Unknown Source)
    at org.apache.hadoop.ipc.rdma.RDMAClient.Code(Unknown Source)
    at org.apache.hadoop.ipc.rdma.RDMAClient.call(Unknown Source)
    at org.apache.hadoop.ipc.Tempest.invoke(Unknown Source)
    ... 12 more`

14/02/05 19:06:30 WARN hdfs.DFSClient: Error Recovery for null bad datanode[0] nodes == null
14/02/05 19:06:30 WARN hdfs.DFSClient: Could not get block locations. Source file "/user/frolo/input/pg132.txt" - Aborting...
14/02/05 19:06:30 INFO hdfs.DFSClient: exception in isClosed

It seems that data is not transferred to DataNodes when I start copying from local filesystem to HDFS. I tested availability of DataNodes:

frolo@A11:~/hadoop-rdma-0.9.8> ./bin/hadoop dfsadmin -report
Warning: $HADOOP_HOME is deprecated.

Configured Capacity: 0 (0 KB)
Present Capacity: 0 (0 KB)
DFS Remaining: 0 (0 KB)
DFS Used: 0 (0 KB)
DFS Used%: �%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0`

-------------------------------------------------
Datanodes available: 0 (4 total, 4 dead)`

`Name: 10.10.1.13:50010
Decommission Status : Normal
Configured Capacity: 0 (0 KB)
DFS Used: 0 (0 KB)
Non DFS Used: 0 (0 KB)
DFS Remaining: 0(0 KB)
DFS Used%: 100%
DFS Remaining%: 0%
Last contact: Wed Feb 05 19:02:54 MSK 2014


Name: 10.10.1.14:50010
Decommission Status : Normal
Configured Capacity: 0 (0 KB)
DFS Used: 0 (0 KB)
Non DFS Used: 0 (0 KB)
DFS Remaining: 0(0 KB)
DFS Used%: 100%
DFS Remaining%: 0%
Last contact: Wed Feb 05 19:02:54 MSK 2014


Name: 10.10.1.16:50010
Decommission Status : Normal
Configured Capacity: 0 (0 KB)
DFS Used: 0 (0 KB)
Non DFS Used: 0 (0 KB)
DFS Remaining: 0(0 KB)
DFS Used%: 100%
DFS Remaining%: 0%
Last contact: Wed Feb 05 19:02:54 MSK 2014


Name: 10.10.1.11:50010
Decommission Status : Normal
Configured Capacity: 0 (0 KB)
DFS Used: 0 (0 KB)
Non DFS Used: 0 (0 KB)
DFS Remaining: 0(0 KB)
DFS Used%: 100%
DFS Remaining%: 0%
Last contact: Wed Feb 05 19:02:55 MSK 2014

and tried to mkdir in HDFS filesystem which has been successful. Restarting of Hadoop daemons have not produced any positive effect.

Could you please help me with this issue? Thank you.

Best, Alex

解决方案

I have found my problem. The issue was related to configuration of hadoop.tmp.dir which has been set to NFS partition. By default it is configured to /tmp which is local fs. After removing hadoop.tmp.dir from core-site.xml the problem has been solved.

这篇关于Hadoop:File ...只能复制到0个节点,而不是1个的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆