为什么每次重启机器后都需要格式化HDFS? [英] Why do we need to format HDFS after every time we restart machine?

查看:493
本文介绍了为什么每次重启机器后都需要格式化HDFS?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在我的笔记本电脑上以伪分布式模式安装了Hadoop,操作系统是Ubuntu。



我改变了hadoop存储数据的路径(默认情况下hadoop存储数据在 / tmp 文件夹中)



hdfs-site.xml 文件如下所示:

 < property> 
<名称> dfs.data.dir< /名称>
<值> / HADOOP_CLUSTER_DATA / data< /值>
< / property>

现在每当我重新启动机器并尝试使用 start-all启动hadoop集群.sh 脚本,数据节点永远不会启动。我确认数据节点不是通过检查日志并使用 jps 命令启动的。然后我


  1. 使用 stop-all.sh 脚本停止集群。

  2. 使用 hadoop namenode -format 命令格式化HDFS。
  3. 使用 start-all.sh启动群集脚本。

现在一切正常,即使我再次停止并启动群集。只有当我重新启动机器并尝试启动集群时才会出现问题。




  • 有没有人遇到类似的问题?

  • 为什么会发生这种情况并且
  • 我们如何解决这个问题?


解决方案

通过更改 / tmp ,您确实取得了default.xml> dfs.datanode.data.dir 数据(块)在重新启动时存活。但是HDFS不止是块。您需要确保所有相关的目标远离 / tmp ,最显着的是 dfs.namenode.name.dir (我不知道你需要更改哪些dirs,这取决于你的配置,但namenode dir是强制性的,也可以是足够的)。



我会也推荐使用更新的Hadoop发行版。顺便说一句,1.1 namenode目录设置是 dfs.name.dir


I have installed Hadoop in pseudo distributed mode on my laptop, OS is Ubuntu.

I have changed paths where hadoop will store its data (by default hadoop stores data in /tmp folder)

hdfs-site.xml file looks as below :

<property>
    <name>dfs.data.dir</name>
    <value>/HADOOP_CLUSTER_DATA/data</value>
</property>

Now whenever I restart machine and try to start hadoop cluster using start-all.sh script, data node never starts. I confirmed that data node is not start by checking logs and by using jps command.

Then I

  1. Stopped cluster using stop-all.sh script.
  2. Formatted HDFS using hadoop namenode -format command.
  3. Started cluster using start-all.sh script.

Now everything works fine even if I stop and start cluster again. Problem occurs only when I restart machine and try to start the cluster.

  • Has anyone encountered similar problem?
  • Why this is happening and
  • How can we solve this problem?

解决方案

By changing dfs.datanode.data.dir away from /tmp you indeed made the data (the blocks) survive across a reboot. However there is more to HDFS than just blocks. You need to make sure all the relevant dirs point away from /tmp, most notably dfs.namenode.name.dir (I can't tell what other dirs you have to change, it depends on your config, but the namenode dir is mandatory, could be also sufficient).

I would also recommend using a more recent Hadoop distribution. BTW, the 1.1 namenode dir setting is dfs.name.dir.

这篇关于为什么每次重启机器后都需要格式化HDFS?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆