只能复制到0个节点而不是minReplication(= 1)。有4个数据节点正在运行,并且在此操作中不包含任何节点 [英] could only be replicated to 0 nodes instead of minReplication (=1). There are 4 datanode(s) running and no node(s) are excluded in this operation

查看:2720
本文介绍了只能复制到0个节点而不是minReplication(= 1)。有4个数据节点正在运行,并且在此操作中不包含任何节点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

 顶点失败,vertexName = initialmap,vertexId = vertex_1449805139484_0001_1_00,诊断= [Task failed,taskId = task_1449805139484_0001_1_00_000003,diagnostics = [AttemptID:attempt_1449805139484_0001_1_00_000003_0 Info:Error:org.apache.hadoop.ipc.RemoteException(java.io.IOException):File / user / hadoop / gridmix-kon / input / _temporary / 1 / _temporary / attempt_14498051394840_0001_m_000003_0 / part-m-00003 / segment-121只能复制到0节点而不是minReplication(= 1)。有4个数据节点正在运行,并且在此操作中不包含任何节点。 
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1441)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem。 java:2702)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock( ClientNamenodeProtocolServerSideTranslatorPB.java:440)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos $ ClientNamenodeProtocol $ 2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine $ Server $ ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC $ Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server $ Handler $ 1.run(Server.java:2014)
at org.apache.hadoop.ipc.Server $ Handler $ 1.run(Server.java:2010)
at java.security.AccessController.doPrivileged (Native Method)$ java
。 security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1561)
at org.apache.hadoop.ipc。 Server $ Handler.run(Server.java:2008)
位于org.apache.hadoop.ipc.Client.call(Client.java:1411)
位于org.apache.hadoop.ipc.Client。调用(Client.java:1364)
at org.apache.hadoop.ipc.ProtobufRpcEngine $ Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy。$ Proxy17.addBlock(未知源代码)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method。在org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)$ or $ $ b $ org.apache.hadoop.io.retry.RetryInvocationHandler中调用(Method.java:606)
。在com.sun.proxy中调用(RetryInvocationHandler.java:103)
$ Proxy17.addBlock(未知源)
在org.apache.hadoop.hdfs.protocolPB.ClientNam enodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
at org.apache.hadoop.hdfs.DFSOutputStream $ DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
at org.apache.hadoop.hdfs.DFSOutputStream $ DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
at org.apache.hadoop.hdfs.DFSOutputStream $ DataStreamer.run(DFSOutputStream.java:525)

任何想法是什么情况?

解决方案 指的是最新的代码)代码。具体代码如下:

$ p $ final DatanodeStorageInfo [] targets = blockplacement.chooseTarget(src,
numOfReplicas,client,excludedNodes,blocksize,
favoredDatanodeDescriptors,storagePolicy);

if(targets.length< minReplication){
throw new IOException(File+ src +只能复制到
+ targets.length +节点而不是minReplication(=
+ minReplication +)。
+ getDatanodeManager()。getNetworkTopology()。getNumOfLeaves()
+datanode(s)running and
+(excludedNodes == null?no:excludedNodes.size())
+节点不包括在此操作中。

$ / code>

发生这种情况时, BlockManager 尝试选择一个目标主机来存储新的数据块,并且找不到一个主机(targets.length< minReplication) minReplication 中设置为1(配置参数: dfs.namenode.replication.min ) hdfs-site.xml 文件。



这可能是由于以下原因之一:


  • 数据节点实例未运行

  • 数据节点实例无法联系名称节点
  • 数据节点空间不足,因此没有新的数据块可以分配给他们



但是,就你而言,错误消息还包含以下信息:

 有4个datanode正在运行,并且此操作中不包含任何节点。 

这意味着,有4个数据节点在运行,并且所有4个数据节点都被考虑用于数据的放置,因为这个操作。

因此,可能的怀疑是数据节点上的磁盘空间。您可以使用以下命令检查数据节点上的磁盘空间:

  hdfs dfsadmin -report 

code>

它为每个实时数据节点提供报告。对于例如在我的情况下,我得到了以下内容:

 实时datanodes(1):

名称:192.168 .56.1:50010(192.168.56.1)
主机名:192.168.56.1
停用状态:正常
配置的容量:648690003968(604.14 GB)
使用的DFS:193849055737(180.54 GB)
使用的非DFS:186164975111(173.38 GB)
剩余的DFS:268675973120(250.22 GB)
使用的DFS百分比:29.88%
剩余的DFS百分比:41.42%
已配置缓存容量:0(0 B)
使用的缓存:0(0 B)
缓存剩余数量:0(0 B)
使用的缓存%:100.00%
剩余缓存% 0.00%
Xceivers:1
最后一次联系:Sun Dec 13 17:17:34 IST 2015

选中 DFS-Remaining DFS-Remaining%。这应该会让您了解数据节点上的剩余空间。

您也可以参考这里的wiki: https ://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo ,其中描述了此错误的原因以及缓解该错误的方法。


I don't know how to fix this error:

Vertex failed, vertexName=initialmap, vertexId=vertex_1449805139484_0001_1_00, diagnostics=[Task failed, taskId=task_1449805139484_0001_1_00_000003, diagnostics=[AttemptID:attempt_1449805139484_0001_1_00_000003_0 Info:Error: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/hadoop/gridmix-kon/input/_temporary/1/_temporary/attempt_14498051394840_0001_m_000003_0/part-m-00003/segment-121 could only be replicated to 0 nodes instead of minReplication (=1). There are 4 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1441)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2014)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2010)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1561)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2008)
at org.apache.hadoop.ipc.Client.call(Client.java:1411)
at org.apache.hadoop.ipc.Client.call(Client.java:1364)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy17.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
at com.sun.proxy.$Proxy17.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)

Any idea what's the case?

解决方案

This error occurs in BlockManager::chooseTarget4NewBlock() (I am referring to the latest code) code. Specific piece of code, which causes this is:

final DatanodeStorageInfo[] targets = blockplacement.chooseTarget(src,
    numOfReplicas, client, excludedNodes, blocksize, 
    favoredDatanodeDescriptors, storagePolicy);

if (targets.length < minReplication) {
  throw new IOException("File " + src + " could only be replicated to "
      + targets.length + " nodes instead of minReplication (="
      + minReplication + ").  There are "
      + getDatanodeManager().getNetworkTopology().getNumOfLeaves()
      + " datanode(s) running and "
      + (excludedNodes == null? "no": excludedNodes.size())
      + " node(s) are excluded in this operation.");
}

This occurs, when the BlockManager tries to choose a target host for storing new block of data and can not find a single host (targets.length < minReplication). minReplication is set to 1 (configuration parameter: dfs.namenode.replication.min) in hdfs-site.xml file.

This could occur due to one of the following reasons:

  • Data Node instances are not running
  • Data Node instances are unable to contact the Name Node
  • Data Nodes have run out of space, hence no new block of data can be allocated to them

But, in your case, error message also contains following information:

There are 4 datanode(s) running and no node(s) are excluded in this operation.

It means, there are 4 Data Nodes running and all the 4 Data Nodes were considered for placement of data, for this operation.

So, possible suspect is disk space on the Data Nodes. You can check the disk space on your Data Nodes, using the following command:

hdfs dfsadmin -report

It gives report for each of your Live Data Nodes. For e.g. in my case, I got the following:

Live datanodes (1):

Name: 192.168.56.1:50010 (192.168.56.1)
Hostname: 192.168.56.1
Decommission Status : Normal
Configured Capacity: 648690003968 (604.14 GB)
DFS Used: 193849055737 (180.54 GB)
Non DFS Used: 186164975111 (173.38 GB)
DFS Remaining: 268675973120 (250.22 GB)
DFS Used%: 29.88%
DFS Remaining%: 41.42%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sun Dec 13 17:17:34 IST 2015

Check the "DFS-Remaining" and "DFS-Remaining%". That should give you an idea about the remaining space on your Data Nodes.

You can also refer to the wiki here: https://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo, which describes the reasons for this error and ways to mitigate it.

这篇关于只能复制到0个节点而不是minReplication(= 1)。有4个数据节点正在运行,并且在此操作中不包含任何节点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆