Hadoop:Datanode进程死亡 [英] Hadoop: Datanode process killed
问题描述
我目前正在使用Hadoop-2.0.3-alpha,并且在我能够完美地使用HDFS(将文件复制到HDFS,从外部框架获得成功,使用webfrontend获得成功)后,在我的VM重新启动后,数据节点过程在一段时间后停止。 namenode进程和所有纱线处理工作没有问题。我将Hadoop安装在另一个用户下的文件夹中,因为我还安装了Hadoop 0.2,它也可以正常工作。
查看所有datanode进程的日志文件,我得到以下信息:
2013-04- 11 16:23:50,475 WARN org.apache.hadoop.util.NativeCodeLoader:无法为您的平台加载native-hadoop库......在适用的情况下使用builtin-java类
2013-04-11 16:24: 17,451 INFO org.apache.hadoop.metrics2.impl.MetricsConfig:从hadoop-metrics2.properties加载的属性
2013-04-11 16:24:23,276 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl:Scheduled快照期限为10秒。
2013-04-11 16:24:23,279信息org.apache.hadoop.metrics2.impl.MetricsSystemImpl:DataNode度量系统启动
INFO org.apache 2013-04-11 16:24:23,480。 hadoop.hdfs.server.datanode.DataNode:配置主机名是用户VirtualBox
2013-04-11 16:24:28,896信息org.apache.hadoop.hdfs.server.datanode.DataNode:打开流媒体服务器在/ 0.0.0.0:50010
2013-04-11 16:24:29,239 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:平衡带宽为1048576字节/秒
2013-04-11 16:24:38,348 INFO org.mortbay.log:通过org.mortbay.log.Slf4jLog记录到org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log)
2013-04-11 16:24:44,627 INFO org.apache.hadoop.http.HttpServer:添加全局过滤器'safety'(class = org.apache.hadoop.http.HttpServer $ QuotingIn putFilter)
2013-04-11 16:24:45,163 INFO org。 apache.hadoop.http.HttpServer:添加过滤器static_user_filter(class = org.apache.hadoop.http.lib.StaticUserWebFil ter $ StaticUserFilter)到上下文datanode
2013-04-11 16:24:45,164信息org.apache.hadoop.http.HttpServer:添加过滤器static_user_filter(class = org.apache.hadoop.http.lib.StaticUserWebFil ter $ StaticUserFilter)到上下文日志
2013-04-11 16 :24:45,164信息org.apache.hadoop.http.HttpServer:添加过滤器static_user_filter(class = org.apache.hadoop.http.lib.StaticUserWebFil ter $ StaticUserFilter)上下文静态
2013-04-11 16: 24:45,355 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:打开的信息服务器在0.0.0.0:50075
2013-04-11 16:24:45,508信息org.apache.hadoop.hdfs。 server.datanode.DataNode:dfs.webhdfs.enabled = false
2013-04-11 16:24:45,536 INFO org.apache.hadoop.http.HttpServer:Jetty绑定到端口50075
2013-04 -11 16:24:45,576信息org.mortbay.log:jetty-6.1.26
2013-04-11 16:25:18,416信息org.mortbay.log:已启动SelectChannelConnector@0.0.0.0:50075
2013-04-11 16:25:42,670信息org.apache.hadoop.ipc.Server:启动端口50020的Socket读取器#1
2013-04-11 16:25:44,955 INFO org.apach e.hadoop.hdfs.server.datanode.DataNode:在/0.0.0.0:50020打开IPC服务器
2013-04-11 16:25:45,483 INFO org.apache.hadoop.hdfs.server.datanode.DataNode :为名称服务收到的刷新请求:空
2013-04-11 16:25:47,079信息org.apache.hadoop.hdfs.server.datanode.DataNode:为名称服务启动BPOfferServices:< default>
2013-04-11 16:25:47,660 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:块池<注册> (存储ID未知)服务本地主机/ 127.0.0.1:8020开始提供服务
INFO org.apache.hadoop.ipc.Server:IPC服务器响应者:启动
2013-04-11 16:25:50,631 INFO org.apache.hadoop.ipc.Server:50020上的IPC Server侦听器:启动
2013-04-11 16:26:15,068 INFO org.apache.hadoop .hdfs.server.common.Storage:通过nodename 3099 @ user-VirtualBox获取/home/hadoop/workspace/hadoop_space/hadoop23/dfs/data/in_use.lock锁定
2013-04-11 16:26: 15,720 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode:块池池块BP-474150866-127.0.1.1-1365686732002(存储标识为DS-317990214-127.0.1.1-50010-1365505141363)服务初始化失败/127.0.0.1:8020
java.io.IOException:/ home / hadoop / workspace / hadoop_space / hadoop23 / dfs / data中的不兼容clusterID:namenode clusterID = CID-1745a89c-fb08-40f0-a14d-d37d01f199c3; datanode clusterID = CID-bb3547b0-03e4-4588-ac25-f0299ff81e4f
at org.apache.hadoop.hdfs.server.datanode.DataStorage .doTransition(DataStorage.java:391)
at org.apache。 hadoop.hdfs.server.datanode.DataStorage .recoverTransitionRead(DataStorage.java:191)
at org.apache.hadoop.hdfs.server.datanode.DataStorage .recoverTransitionRead(DataStorage.java:219)
at org.apache.hadoop.hdfs.server.datanode.DataNode.in itStorage(DataNode.java:850)
at org.apache.hadoop.hdfs.server.datanode.DataNode.in itBlockPool(DataNode.java:821 )
at org.apache.hadoop.hdfs.server.datanode.BPerviceAc ice.verifyAndSetNamespaceInfo(BPOfferService.java:280)
at org.apache.hadoop.hdfs.server.datanode.BPServiceAc tor.connectToNNAndHandshake (BPServiceActor.java:22 2)
at org.apache.hadoop.hdfs.server.datanode.BPServiceAc tor.run(BPServiceActor.java:664)
at java.lang.Thread.run(Thread .java:722)
2013-04-11 16:26:16,212 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:Ending b锁池服务:块池BP-474150866-127.0.1.1-1365686732002(存储ID DS-317990214-127.0.1.1-50010-1365505141363)service localhost / 127.0.0.1:8020
2013-04-11 16 :26:16,276信息org.apache.hadoop.hdfs.server.datanode.DataNode:已删除块池BP-474150866-127.0.1.1-1365686732002(存储ID DS-317990214-127.0.1.1-50010-1365505141363)
2013-04-11 16:26:18,396 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:退出Datanode
2013-04-11 16:26:18,940 INFO org.apache.hadoop.util。 ExitUtil:以状态0退出
2013-04-11 16:26:19,668 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:SHUTDOWN_MSG:
/ ******** ********************** ******** **
SHUTDOWN_MSG:关闭用户的DataNode-VirtualBox / 127.0.1.1
*************************** *********************** ********** /
有什么想法?可能是我在安装过程中犯了一个错误?但奇怪的是,它曾经工作过一次。我还必须说,如果我以另外的用户身份登录以执行命令 ./ hadoop-daemon.sh
start namenode
和datanode一样,我需要添加sudo。
我使用这个安装指南: http://jugnu-life.blogspot.ie/2012/0...rial-023x.html a>
顺便说一句,我使用的是Oracle Java-7版本。
问题可能在于namenode是在集群设置完成之后格式化的,并且datanode不是,所以slave仍然指向旧的namenode。
我们必须删除并重新创建datanode的本地文件系统上的/ home / hadoop / dfs / data文件夹。
- 检查您的hdfs-site.xml文件以查看dfs.data.dir指向的位置
- 并删除该文件夹
- 然后重新启动机器上的datanode守护进程
上面的步骤应该重新创建文件夹并解决问题。
如果上述说明无效,请分享您的配置信息。
I am currently using Hadoop-2.0.3-alpha and after I could work perfectly with HDFS (copying files into HDFS, getting success from an external framework, using the webfrontend), after a new start of my VM, the datanode process is stopping after a while. The namenode process and all yarn processes work without a problem. I installed Hadoop in a folder under an additional user, as I also still have installed Hadoop 0.2, which worked fine too. Taking a look at the log-file of all datanode processes I got the following information:
2013-04-11 16:23:50,475 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2013-04-11 16:24:17,451 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2013-04-11 16:24:23,276 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2013-04-11 16:24:23,279 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started 2013-04-11 16:24:23,480 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Configured hostname is user-VirtualBox 2013-04-11 16:24:28,896 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened streaming server at /0.0.0.0:50010 2013-04-11 16:24:29,239 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwith is 1048576 bytes/s 2013-04-11 16:24:38,348 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 2013-04-11 16:24:44,627 INFO org.apache.hadoop.http.HttpServer: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer$QuotingIn putFilter) 2013-04-11 16:24:45,163 INFO org.apache.hadoop.http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFil ter$StaticUserFilter) to context datanode 2013-04-11 16:24:45,164 INFO org.apache.hadoop.http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFil ter$StaticUserFilter) to context logs 2013-04-11 16:24:45,164 INFO org.apache.hadoop.http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFil ter$StaticUserFilter) to context static 2013-04-11 16:24:45,355 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened info server at 0.0.0.0:50075 2013-04-11 16:24:45,508 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dfs.webhdfs.enabled = false 2013-04-11 16:24:45,536 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50075 2013-04-11 16:24:45,576 INFO org.mortbay.log: jetty-6.1.26 2013-04-11 16:25:18,416 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:50075 2013-04-11 16:25:42,670 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 50020 2013-04-11 16:25:44,955 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened IPC server at /0.0.0.0:50020 2013-04-11 16:25:45,483 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Refresh request received for nameservices: null 2013-04-11 16:25:47,079 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting BPOfferServices for nameservices: <default> 2013-04-11 16:25:47,660 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool <registering> (storage id unknown) service to localhost/127.0.0.1:8020 starting to offer service 2013-04-11 16:25:50,515 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2013-04-11 16:25:50,631 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 50020: starting 2013-04-11 16:26:15,068 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /home/hadoop/workspace/hadoop_space/hadoop23/dfs/data/in_use.lock acquired by nodename 3099@user-VirtualBox 2013-04-11 16:26:15,720 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-474150866-127.0.1.1-1365686732002 (storage id DS-317990214-127.0.1.1-50010-1365505141363) service to localhost/127.0.0.1:8020 java.io.IOException: Incompatible clusterIDs in /home/hadoop/workspace/hadoop_space/hadoop23/dfs/data: namenode clusterID = CID-1745a89c-fb08-40f0-a14d-d37d01f199c3; datanode clusterID = CID-bb3547b0-03e4-4588-ac25-f0299ff81e4f at org.apache.hadoop.hdfs.server.datanode.DataStorage .doTransition(DataStorage.java:391) at org.apache.hadoop.hdfs.server.datanode.DataStorage .recoverTransitionRead(DataStorage.java:191) at org.apache.hadoop.hdfs.server.datanode.DataStorage .recoverTransitionRead(DataStorage.java:219) at org.apache.hadoop.hdfs.server.datanode.DataNode.in itStorage(DataNode.java:850) at org.apache.hadoop.hdfs.server.datanode.DataNode.in itBlockPool(DataNode.java:821) at org.apache.hadoop.hdfs.server.datanode.BPOfferServ ice.verifyAndSetNamespaceInfo(BPOfferService.java: 280) at org.apache.hadoop.hdfs.server.datanode.BPServiceAc tor.connectToNNAndHandshake(BPServiceActor.java:22 2) at org.apache.hadoop.hdfs.server.datanode.BPServiceAc tor.run(BPServiceActor.java:664) at java.lang.Thread.run(Thread.java:722) 2013-04-11 16:26:16,212 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool BP-474150866-127.0.1.1-1365686732002 (storage id DS-317990214-127.0.1.1-50010-1365505141363) service to localhost/127.0.0.1:8020 2013-04-11 16:26:16,276 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool BP-474150866-127.0.1.1-1365686732002 (storage id DS-317990214-127.0.1.1-50010-1365505141363) 2013-04-11 16:26:18,396 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode 2013-04-11 16:26:18,940 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0 2013-04-11 16:26:19,668 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: /************************************************** ********** SHUTDOWN_MSG: Shutting down DataNode at user-VirtualBox/127.0.1.1 ************************************************** **********/
Any ideas? May be I made a mistake during the installation process? But it is strange, that it worked once. I also have to say, that if I am logged in as my additional user to execute the commands
./hadoop-daemon.sh
start namenode
and the same with the datanode, I need to add sudo.I used this installation guide: http://jugnu-life.blogspot.ie/2012/0...rial-023x.html
By the way, I use the Oracle Java-7 version.
解决方案The problem could be that the namenode was formatted after the cluster was set up and the datanodes were not, so the slaves are still referring to the old namenode.
We have to delete and recreate the folder /home/hadoop/dfs/data on the local filesystem for the datanode.
- Check your hdfs-site.xml file to see where dfs.data.dir is pointing to
- and delete that folder
- and then restart the datanode daemon on the machine
The steps above should recreate the folder and resolve the problem.
Please share your config info if the instructions above do not work.
这篇关于Hadoop:Datanode进程死亡的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!