有0个数据节点正在运行,并且在此操作中不包含任何节点 [英] There are 0 datanode(s) running and no node(s) are excluded in this operation

查看:146
本文介绍了有0个数据节点正在运行,并且在此操作中不包含任何节点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我建立了一个多节点Hadoop集群。 NameNode和Seconaday namenode在同一台机器上运行,并且群集只有一个Datanode。所有节点均在Amazon EC2机器上配置。



以下是主节点上的配置文件



<$ p $ (主节点的公共IP)

从属
54.68.169.62(从节点的公共IP地址)$ b(主节点的公共IP地址)
$ b 54.68.169.62

$ b core-site.xml

<配置>
<属性>
<名称> fs.default.name< /名称>
< value> hdfs:// localhost:9000< / value>
< / property>
< / configuration>


mapred-site.xml

<配置>
<属性>
< name> mapreduce.framework.name< / name>
<值>纱线< /值>
< / property>
< / configuration>


hdfs-site.xml

<配置>
<属性>
< name> dfs.replication< / name>
<值> 1< /值>
< / property>
<属性>
<名称> dfs.namenode.name.dir< /名称>
<值>文件:/ usr / local / hadoop_store / hdfs / namenode< / value>
< / property>
<属性>
< name> dfs.datanode.name.dir< / name>
<值>文件:/ usr / local / hadoop_store / hdfs / datanode< / value>
< / property>
< / configuration>

现在是datanode上的配置文件

  core-site.xml 

<配置>
<属性>
<名称> fs.default.name< /名称>
< value> hdfs://54.68.218.192:10001< / value>
< / property>
< / configuration>

mapred-site.xml

 <结构> 
<属性>
<名称> mapred.job.tracker< / name>
<值> 54.68.218.192:10002< /值>
< / property>
< / configuration>

hdfs-site.xml

 <结构> 
<属性>
< name> dfs.replication< / name>
<值> 1< /值>
< / property>
<属性>
<名称> dfs.namenode.name.dir< /名称>
<值>文件:/ usr / local / hadoop_store / hdfs / namenode< / value>
< / property>
<属性>
< name> dfs.datanode.name.dir< / name>
<值>文件:/ usr / local / hadoop_store / hdfs / datanode< / value>
< / property>
< / configuration>

运行于Namenode上的jps给出以下
5696 NameNode
6504 Jps
5905 SecondaryNameNode
6040 ResourceManager


和jps on datanode
2883 DataNode
3496 Jps
3381 NodeManager



对我来说这似乎是正确的。



现在,当我尝试运行put命令时:
hadoop fs -put count_inputfile / test / input /



它给了我以下错误:
put:File / count_inputfile。 COPYING 只能复制到0个节点而不是minReplication(= 1)。在此操作中,有0个datanode正在运行,并且不包含任何节点。



datanode上的日志显示如下

  hadoop-datanode日志
INFO org.apache.hadoop.ipc.Client:重试连接到服务器:54.68.218.192/54.68.218.192:10001 。已经尝试过8次(s);重试策略是RetryUpToToMaximumCountWithFixedSleep(maxRetries = 10,sleepTime = 1000 MILLISECONDS)

yarn-nodemanager log
INFO org.apache.hadoop.ipc.Client:重试连接到服务器:0.0.0.0/0.0.0.0:8031。已经尝试了9次(s);重试策略为RetryUpToMaximumCountWithFixedSleep(maxRetries = 10,sleepTime = 1000 MILLISECONDS)

节点管理器(50070)的Web UI显示有0个活动节点和0个死点使用的dfs是100%bb我已经禁用了IPV6。在几个网站上
我发现我也应该编辑/ etc / hosts文件
我也编辑了它们,它们看起来像这样:
127.0.0.1 localhost
172.31。 25.151 ip-172-31-25-151.us-west-2.compute.internal
172.31.25.152 ip-172-31-25-152.us-west-2.compute.internal



为什么我仍然在创建错误?

解决方案

第1步:停止hadoop和清理临时文件 hduser



  sudo rm -R / tmp / * 

此外,您可能需要删除并重新创建 / app / hadoop / tmp (主要是当我将hadoop版本从 2.2.0 更改为 2.7.0

  sudo rm -r / app / hadoop / tmp 
sudo mkdir -p / app / hadoop / tmp
sudo chown hduser:hadoop / app / hadoop / tmp
sudo chmod 750 / app / hadoop / tmp

第二步:格式namenode

  hdfs namenode -format 

现在,我可以看到DataNode

  hduser @ prayagupd:〜$ jps 
19135 NameNode
20497 Jps
19477 DataNode
20447 NodeManager
19902 SecondaryNameNode
20106 ResourceManager


I have set up a multi node Hadoop Cluster. The NameNode and Seconaday namenode runs on the same machine and the cluster has only one Datanode. All the nodes are configured on Amazon EC2 machines.

Following are the configuration files on the master node

masters
54.68.218.192 (public IP of the master node)

slaves
54.68.169.62 (public IP of the slave node)


core-site.xml

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>


mapred-site.xml

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>


hdfs-site.xml

 <configuration>
 <property>
 <name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.name.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/datanode</value>
</property>
</configuration>

Now are the configuration files on the datanode

core-site.xml

 <configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://54.68.218.192:10001</value>
</property>
</configuration>

mapred-site.xml

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>54.68.218.192:10002</value>
</property>
</configuration>

hdfs-site.xml

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.name.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/datanode</value>
</property>
</configuration>

the jps run on the Namenode give the following 5696 NameNode 6504 Jps 5905 SecondaryNameNode 6040 ResourceManager

and jps on datanode 2883 DataNode 3496 Jps 3381 NodeManager

which to me seems right.

Now when I try to run a put command: hadoop fs -put count_inputfile /test/input/

it gives me the following error: put: File /count_inputfile.COPYING could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.

the logs on the datanode says the following

hadoop-datanode log
INFO org.apache.hadoop.ipc.Client: Retrying connect to server:      54.68.218.192/54.68.218.192:10001. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

yarn-nodemanager log INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

the web UI of node manager(50070) shows that there are 0 live nodes and 0 dead nodes and the dfs used is 100%

I have also disabled IPV6. on a few websites I found out that I should also edit the /etc/hosts file I have also edited them and they look like this 127.0.0.1 localhost 172.31.25.151 ip-172-31-25-151.us-west-2.compute.internal 172.31.25.152 ip-172-31-25-152.us-west-2.compute.internal

Why I am still geting the error?

解决方案

Two things worked for me,

STEP 1 : stop hadoop and clean temp files from hduser

sudo rm -R /tmp/*

also, you may need to delete and recreate /app/hadoop/tmp (mostly when I change hadoop version from 2.2.0 to 2.7.0)

sudo rm -r /app/hadoop/tmp
sudo mkdir -p /app/hadoop/tmp
sudo chown hduser:hadoop /app/hadoop/tmp
sudo chmod 750 /app/hadoop/tmp

STEP 2: format namenode

hdfs namenode -format

Now, I can see DataNode

hduser@prayagupd:~$ jps
19135 NameNode
20497 Jps
19477 DataNode
20447 NodeManager
19902 SecondaryNameNode
20106 ResourceManager

这篇关于有0个数据节点正在运行,并且在此操作中不包含任何节点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆