没有启动数据节点 [英] No data nodes are started

查看:19
本文介绍了没有启动数据节点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用以下指南在伪分布式配置中设置 Hadoop 版本 0.20.203.0:

I am trying to setup Hadoop version 0.20.203.0 in a pseudo distributed configuration using the following guide:

http://www.javacodegeeks.com/2012/01/hadoop-modes-explained-standalone.html

运行 start-all.sh 脚本后,我运行jps".

After running the start-all.sh script I run "jps".

我得到这个输出:

4825 NameNode
5391 TaskTracker
5242 JobTracker
5477 Jps
5140 SecondaryNameNode

当我尝试使用以下方法向 hdfs 添加信息时:

When I try to add information to the hdfs using:

bin/hadoop fs -put conf input

我有一个错误:

hadoop@m1a2:~/software/hadoop$ bin/hadoop fs -put conf input
12/04/10 18:15:31 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/hadoop/input/core-site.xml could only be replicated to 0 nodes, instead of 1
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1417)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:596)
        at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1383)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1379)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:416)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1377)

        at org.apache.hadoop.ipc.Client.call(Client.java:1030)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)
        at $Proxy1.addBlock(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
        at $Proxy1.addBlock(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3104)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2975)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2255)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2446)

12/04/10 18:15:31 WARN hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null
12/04/10 18:15:31 WARN hdfs.DFSClient: Could not get block locations. Source file "/user/hadoop/input/core-site.xml" - Aborting...
put: java.io.IOException: File /user/hadoop/input/core-site.xml could only be replicated to 0 nodes, instead of 1
12/04/10 18:15:31 ERROR hdfs.DFSClient: Exception closing file /user/hadoop/input/core-site.xml : org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/hadoop/input/core-site.xml could only be replicated to 0 nodes, instead of 1
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1417)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:596)
        at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1383)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1379)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:416)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1377)

org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/hadoop/input/core-site.xml could only be replicated to 0 nodes, instead of 1
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1417)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:596)
        at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1383)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1379)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:416)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1377)

        at org.apache.hadoop.ipc.Client.call(Client.java:1030)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)
        at $Proxy1.addBlock(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
        at $Proxy1.addBlock(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3104)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2975)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2255)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2446)

我不完全确定,但我相信这可能与 datanode 没有运行的事实有关.

I am not totally sure but I believe that this may have to do with the fact that the datanode is not running.

有谁知道我做错了什么,或者如何解决这个问题?

Does anybody know what I have done wrong, or how to fix this problem?

这是 datanode.log 文件:

This is the datanode.log file:

2012-04-11 12:27:28,977 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = m1a2/139.147.5.55
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.203.0
STARTUP_MSG:   build = http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-203 -r 1099333; compiled by 'oom' on Wed May  4 07:57:50 PDT 2011
************************************************************/
2012-04-11 12:27:29,166 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2012-04-11 12:27:29,181 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2012-04-11 12:27:29,183 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2012-04-11 12:27:29,183 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2012-04-11 12:27:29,342 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2012-04-11 12:27:29,347 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2012-04-11 12:27:29,615 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /tmp/hadoop-hadoop/dfs/data: namenode namespaceID = 301052954; datanode namespaceID = 229562149
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:232)
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:147)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:354)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:268)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1480)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1419)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1437)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1563)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1573)

2012-04-11 12:27:29,617 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at m1a2/139.147.5.55
************************************************************/

推荐答案

您在 DN 日志中遇到的错误描述如下:http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/#java-io-ioexception-incompatible-namespaceids

That error you are getting in the DN log is described here: http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/#java-io-ioexception-incompatible-namespaceids

从那个页面:

目前,似乎有两种解决方法,如下所述.

At the moment, there seem to be two workarounds as described below.

解决方法 1:从头开始

我可以证明以下步骤解决了这个错误,但副作用不会让你高兴(我也不是).我发现的粗略解决方法是:

I can testify that the following steps solve this error, but the side effects won’t make you happy (me neither). The crude workaround I have found is to:

  1. 停止集群
  2. 删除有问题的DataNode上的data目录:目录由conf/hdfs-site.xml中的dfs.data.dir指定;如果您按照本教程进行操作,则相关目录为/app/hadoop/tmp/dfs/data
  3. 重新格式化 NameNode(注意:在此过程中所有 HDFS 数据都将丢失!)
  4. 重启集群

当删除所有 HDFS 数据并从头开始听起来不是一个好主意时(在初始设置/测试期间可能没问题),您可以尝试第二种方法.

When deleting all the HDFS data and starting from scratch does not sound like a good idea (it might be ok during the initial setup/testing), you might give the second approach a try.

解决方法 2:更新有问题的 DataNode 的 namespaceID

非常感谢 Jared Stehler 提出以下建议.我还没有亲自测试过,但请随时尝试并将您的反馈发送给我.此解决方法是微创",因为您只需在有问题的 DataNode 上编辑一个文件:

Big thanks to Jared Stehler for the following suggestion. I have not tested it myself yet, but feel free to try it out and send me your feedback. This workaround is "minimally invasive" as you only have to edit one file on the problematic DataNodes:

  1. 停止DataNode
  2. 修改/current/VERSION中namespaceID的值以匹配当前NameNode的值
  3. 重启DataNode

如果您按照我的教程中的说明进行操作,相关文件的完整路径为:

If you followed the instructions in my tutorials, the full path of the relevant files are:

NameNode:/app/hadoop/tmp/dfs/name/current/VERSION

NameNode: /app/hadoop/tmp/dfs/name/current/VERSION

数据节点:/app/hadoop/tmp/dfs/data/current/VERSION

DataNode: /app/hadoop/tmp/dfs/data/current/VERSION

(背景:dfs.data.dir 默认设置为

(background: dfs.data.dir is by default set to

${hadoop.tmp.dir}/dfs/data,我们设置了hadoop.tmp.dir

${hadoop.tmp.dir}/dfs/data, and we set hadoop.tmp.dir

在本教程中为/app/hadoop/tmp).

in this tutorial to /app/hadoop/tmp).

如果你想知道 VERSION 的内容是什么样的,这是我的一个:

If you wonder how the contents of VERSION look like, here’s one of mine:

#/current/VERSION 的内容

# contents of /current/VERSION

命名空间ID=393514426

namespaceID=393514426

storageID=DS-1706792599-10.10.10.1-50010-1204306713481

storageID=DS-1706792599-10.10.10.1-50010-1204306713481

cTime=1215607609074

cTime=1215607609074

storageType=DATA_NODE

storageType=DATA_NODE

layoutVersion=-13

layoutVersion=-13

这篇关于没有启动数据节点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆