为什么HDFS默认将数据存储在/ tmp中? [英] Why does HDFS store data in /tmp by default?
问题描述
- NameNode将块的位置存储在文件夹
dfs.namenode.name中。 dir
- DataNode将实际数据块存储在文件夹
dfs.datanode.data.dir
li>
这两个属性组成了HDFS最重要的部分:保存数据的位置。
默认情况下:
-
dfs.namenode.name.dir
和dfs.datanode.data.dir
是file:// $ {hadoop.tmp.dir}中的子目录
(请参阅hdfs-default.xml
) -
$ {hadoop.tmp.dir}
是/ tmp / hadoop - $ {user.name}
(参见core-default.xml
)
<总之,默认情况下,HDFS会将数据存储在
/ tmp
中。 在现代Linux发行版中, / tmp
经常被清空。 p> 默认情况下,为什么HDFS将数据存储在 / tmp
中?为什么有人希望他们的数据是临时的?
因为Hadoop对你的文件结构没有任何假设,所以想要直接安装并指导用户在正确配置时覆盖这些属性。
大多数Linux发行版都有/ tmp,并且所有用户都可以公开写入。不确定/ etc,/ var或/ mnt符合该标准
显然,没有人希望数据是暂时的,但默认设置并不意味着要准备好生产。例如, fs.defaultFS
默认只是本地文件系统,并且只有一个文件副本
In HDFS:
- a NameNode stores the location of blocks in the folder
dfs.namenode.name.dir
- DataNode store the actual data blocks in the folder
dfs.datanode.data.dir
Together, these two properties make up the most important part of HDFS: where your data is saved.
By default:
dfs.namenode.name.dir
anddfs.datanode.data.dir
are sub-directories insidefile://${hadoop.tmp.dir}
(seehdfs-default.xml
)${hadoop.tmp.dir}
is/tmp/hadoop-${user.name}
(seecore-default.xml
)
In short, HDFS stores your data in /tmp
by default.
/tmp
is emptied often in modern Linux distros.
Why does HDFS store data in /tmp
by default? Why would anyone want their data to be temporary?
Because Hadoop makes no assumptions about your file structure, wants to be straightforward to install, and guides users to override those properties upon proper configuration.
Most Linux distros have /tmp, and it's publicly writable by all users. Not sure /etc, /var or /mnt fit that criteria
Obviously no one wants data to be temporary, but the defaults aren't meant to be production ready, either. For example, fs.defaultFS
is only the local filesystem by default and there's only 1 file replica
这篇关于为什么HDFS默认将数据存储在/ tmp中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!