关闭hadoop HDFS后数据丢失了? [英] Data lost after shutting down hadoop HDFS?

查看:619
本文介绍了关闭hadoop HDFS后数据丢失了?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

你好我正在学习hadoop,我有一个简单的愚蠢问题:在我关闭HDFS(通过调用hadoop_home / sbin / stop-dfs.sh)之后,HDFS上的数据丢失了还是可以恢复? / data>

如果您停止HDFS,数据不会丢失,前提是您将NameNode和DataNode的数据存储在指定的持久位置属性:


  • dfs.namenode.name.dir - >确定何处本地文件系统DFS名称节点应该存储名称表(fsimage)。如果这是一个以逗号分隔的目录列表,则名称表将被复制到所有目录中,以实现冗余。默认值: file:// $ {hadoop.tmp.dir} / dfs / name

  • dfs .datanode.data.dir - >确定DFS数据节点应在本地文件系统上存储块的位置。如果这是以逗号分隔的目录列表,则数据将存储在所有已命名的目录中,通常位于不同的设备上。不存在的目录被忽略。默认值: file:// $ {hadoop.tmp.dir} / dfs / data



正如您所看到的,这两个属性的默认值都指向 $ {hadoop.tmp.dir} ,默认值为 / TMP 。您可能已经知道基于Unix的系统中的 / tmp 中的数据会在重新启动时清除。



你可以指定dir位置不是 / tmp ,那么重启后的Hadoop HDFS守护进程将能够读回数据,因此即使在群集重启时也不会丢失数据。


Hi I'm learning hadoop and I have a simple dumb question: After I shut down HDFS(by calling hadoop_home/sbin/stop-dfs.sh), is the data on HDFS lost or can I get it back?

解决方案

Data wouldn't be lost if you stop HDFS, provided you store the data of NameNode and DataNode's in a persistent locations specified using the properties:

  • dfs.namenode.name.dir -> Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy. Default value: file://${hadoop.tmp.dir}/dfs/name
  • dfs.datanode.data.dir -> Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored. Default value: file://${hadoop.tmp.dir}/dfs/data

As you could see, the default values for both properties point to ${hadoop.tmp.dir} which by default is /tmp. You might already know that the data in /tmp in Unix based systems get's cleared on reboot's.

So, if you would specify dir location's other than /tmp then Hadoop HDFS daemons on reboot would be able to read back the data and hence no data loss even on cluster restart's.

这篇关于关闭hadoop HDFS后数据丢失了?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆