Hadoop中的数据复制错误 [英] Data Replication error in Hadoop

查看:131
本文介绍了Hadoop中的数据复制错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在通过遵循 Michael在我的机器上实现Hadoop单节点群集Noll的教程,并遇到数据复制错误:

以下是完整的错误消息:


 > hadoop @ laptop:〜/ hadoop $ bin / hadoop dfs -copyFromLocal 
> tmp / testfiles testfiles
>
> 12/05/04 16:18:41 WARN hdfs.DFSClient:DataStreamer异常:
> org.apache.hadoop.ipc.RemoteException:java.io.IOException:文件
> /user/hadoop/testfiles/testfiles/file1.txt只能复制到
> 0节点,而不是1
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
>在
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
> at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>在
>的java.lang.reflect.Method.invoke(Method.java:597) org.apache.hadoop.ipc.RPC $ Server.call(RPC.java:508)at
> org.apache.hadoop.ipc.Server $ Handler $ 1.run(Server.java:959)at
> org.apache.hadoop.ipc.Server $ Handler $ 1.run(Server.java:955)at
> java.security.AccessController.doPrivileged(Native Method)at
> javax.security.auth.Subject.doAs(Subject.java:396)at
> org.apache.hadoop.ipc.Server $ Handler.run(Server.java:953)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:740)at
> org.apache.hadoop.ipc.RPC $ Invoker.invoke(RPC.java:220)at
> $ Proxy0.addBlock(未知来源)在
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>在
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>在
>的java.lang.reflect.Method.invoke(Method.java:597) org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>在
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>在$ Proxy0.addBlock(未知来源)在
> org.apache.hadoop.hdfs.DFSClient $ DFSOutputStream.locateFollowingBlock(DFSClient.java:2937)
>在
> org.apache.hadoop.hdfs.DFSClient $ DFSOutputStream.nextBlockOutputStream(DFSClient.java:2819)
>在
> org.apache.hadoop.hdfs.DFSClient $ DFSOutputStream.access $ 2000(DFSClient.java:2102)
>在
> org.apache.hadoop.hdfs.DFSClient $ DFSOutputStream $ DataStreamer.run(DFSClient.java:2288)
>
> 12/05/04 16:18:41 WARN hdfs.DFSClient:块空错误恢复
> bad datanode [0] nodes == null 12/05/04 16:18:41 WARN hdfs.DFSClient:
>无法获取屏蔽位置。源文件
> /user/hadoop/testfiles/testfiles/file1.txt - 正在中止...
> copyFromLocal:java.io.IOException:文件
> /user/hadoop/testfiles/testfiles/file1.txt只能复制到
> 0节点,而不是1 12/05/04 16:18:41错误hdfs.DFSClient:
>异常关闭文件/user/hadoop/testfiles/testfiles/file1.txt:
> org.apache.hadoop.ipc.RemoteException:java.io.IOException:文件
> /user/hadoop/testfiles/testfiles/file1.txt只能复制到
> 0节点,而不是1
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
>在
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
> at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>在
>的java.lang.reflect.Method.invoke(Method.java:597) org.apache.hadoop.ipc.RPC $ Server.call(RPC.java:508)at
> org.apache.hadoop.ipc.Server $ Handler $ 1.run(Server.java:959)at
> org.apache.hadoop.ipc.Server $ Handler $ 1.run(Server.java:955)at
> java.security.AccessController.doPrivileged(Native Method)at
> javax.security.auth.Subject.doAs(Subject.java:396)at
> org.apache.hadoop.ipc.Server $ Handler.run(Server.java:953)
>
> org.apache.hadoop.ipc.RemoteException:java.io.IOException:文件
> /user/hadoop/testfiles/testfiles/file1.txt只能复制到
> 0节点,而不是1
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
>在
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
> at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>在
>的java.lang.reflect.Method.invoke(Method.java:597) org.apache.hadoop.ipc.RPC $ Server.call(RPC.java:508)at
> org.apache.hadoop.ipc.Server $ Handler $ 1.run(Server.java:959)at
> org.apache.hadoop.ipc.Server $ Handler $ 1.run(Server.java:955)at
> java.security.AccessController.doPrivileged(Native Method)at
> javax.security.auth.Subject.doAs(Subject.java:396)at
> org.apache.hadoop.ipc.Server $ Handler.run(Server.java:953)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:740)at
> org.apache.hadoop.ipc.RPC $ Invoker.invoke(RPC.java:220)at
> $ Proxy0.addBlock(未知来源)在
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>在
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>在
>的java.lang.reflect.Method.invoke(Method.java:597) org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>在
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>在$ Proxy0.addBlock(未知来源)在
> org.apache.hadoop.hdfs.DFSClient $ DFSOutputStream.locateFollowingBlock(DFSClient.java:2937)
>在
> org.apache.hadoop.hdfs.DFSClient $ DFSOutputStream.nextBlockOutputStream(DFSClient.java:2819)
>在
> org.apache.hadoop.hdfs.DFSClient $ DFSOutputStream.access $ 2000(DFSClient.java:2102)
>在
>

$ b

当我执行时:

  bin / stop-all.sh 

它表示datanode尚未启动,因此无法停止。虽然 jps 的输出显示datanode存在。



我尝试格式化namenode 更改所有者权限,但似乎不起作用。希望我没有错过任何其他相关信息。



预先致谢。

解决方案

div>

这个解决方案对我来说是一个接一个运行namenode和datanode,而不是使用 bin / start-all.sh 一起运行。使用这种方法会发生什么情况是,如果在网络上设置datanode时出现问题,并且在stackoverflow上有很多帖子显示namenode需要一些时间才能启动,那么错误就清晰可见,因此应该给它一些时间在启动datanode之前启动。此外,在这种情况下,我遇到了namenode和datanode的不同id的问题,我必须使用namenode的id来更改datanode的id。

一步一步的过程将是:



  1. 启动namenode bin / hadoop namenode 。检查是否有错误。

  2. 启动datanodes bin / hadoop datanode 。检查是否有错误。

  3. 现在使用'bin / start-mapred.sh'启动任务跟踪器作业跟踪器。


I am implementing the Hadoop Single Node Cluster on my machine by following Michael Noll's tutorial and have come across data replication error:

Here's the full error message:

> hadoop@laptop:~/hadoop$ bin/hadoop dfs -copyFromLocal
> tmp/testfiles testfiles
> 
> 12/05/04 16:18:41 WARN hdfs.DFSClient: DataStreamer Exception:
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /user/hadoop/testfiles/testfiles/file1.txt could only be replicated to
> 0 nodes, instead of 1   at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
>     at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
>     at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)     at
> org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)     at
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)     at
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)     at
> java.security.AccessController.doPrivileged(Native Method)  at
> javax.security.auth.Subject.doAs(Subject.java:396)  at
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
> 
>     at org.apache.hadoop.ipc.Client.call(Client.java:740)   at
> org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)  at
> $Proxy0.addBlock(Unknown Source)    at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)     at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>     at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>     at $Proxy0.addBlock(Unknown Source)     at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2937)
>     at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2819)
>     at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
>     at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)
> 
> 12/05/04 16:18:41 WARN hdfs.DFSClient: Error Recovery for block null
> bad datanode[0] nodes == null 12/05/04 16:18:41 WARN hdfs.DFSClient:
> Could not get block locations. Source file
> "/user/hadoop/testfiles/testfiles/file1.txt" - Aborting...
> copyFromLocal: java.io.IOException: File
> /user/hadoop/testfiles/testfiles/file1.txt could only be replicated to
> 0 nodes, instead of 1 12/05/04 16:18:41 ERROR hdfs.DFSClient:
> Exception closing file /user/hadoop/testfiles/testfiles/file1.txt :
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /user/hadoop/testfiles/testfiles/file1.txt could only be replicated to
> 0 nodes, instead of 1   at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
>     at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
>     at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)     at
> org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)     at
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)     at
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)     at
> java.security.AccessController.doPrivileged(Native Method)  at
> javax.security.auth.Subject.doAs(Subject.java:396)  at
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
> 
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /user/hadoop/testfiles/testfiles/file1.txt could only be replicated to
> 0 nodes, instead of 1   at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
>     at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
>     at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)     at
> org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)     at
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)     at
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)     at
> java.security.AccessController.doPrivileged(Native Method)  at
> javax.security.auth.Subject.doAs(Subject.java:396)  at
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
> 
>     at org.apache.hadoop.ipc.Client.call(Client.java:740)   at
> org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)  at
> $Proxy0.addBlock(Unknown Source)    at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)     at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>     at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>     at $Proxy0.addBlock(Unknown Source)     at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2937)
>     at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2819)
>     at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
>     at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)

Also when I execute:

bin/stop-all.sh

It says that datanode has not been started and thus cannot be stopped. Though, the output of jps says the datanode being present.

I tried formatting the namenode, changing owner permissions, but it does not seem to work. Hope I didn't miss any other relevant information.

Thanks in advance.

解决方案

The solution that worked for me was to run namenode and datanode one by one and not together using bin/start-all.sh. What happens using this approach is that the error is clearly visible if you are having some problem setting the datanodes on the network and also many posts on stackoverflow suggest that namenode requires some time to start-off, therefore, it should be given some time to start before starting the datanodes. Also, in this case I was having problem with different ids of namenode and datanodes for which I had to change the ids of the datanode with same id as the namenode.

The step by step procedure will be:

  1. Start the namenode bin/hadoop namenode. Check for errors, if any.
  2. Start the datanodes bin/hadoop datanode. Check for errors, if any.
  3. Now start the task-tracker, job tracker using 'bin/start-mapred.sh'

这篇关于Hadoop中的数据复制错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆