HDFS 与 LFS - 如何 Hadoop Dist.文件系统是建立在本地文件系统之上的? [英] HDFS vs LFS - How Hadoop Dist. File System is built over local file system?

查看:20
本文介绍了HDFS 与 LFS - 如何 Hadoop Dist.文件系统是建立在本地文件系统之上的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从我阅读的各种博客中,我了解到 HDFS 是存在于计算机本地文件系统之上的另一层.

From various blogs I read, I comprehended that HDFS is another layer that exists over Local filesystem in a computer.

我也安装了 hadoop,但我无法理解本地文件系统上是否存在 hdfs 层.

I also installed hadoop but I have trouble understanding the existence of hdfs layer over local file system.

这是我的问题..

考虑我正在以伪分布式模式安装 hadoop.在此安装过程中,引擎盖下会发生什么?我在配置文件中添加了一个 tmp.dir 参数.是否是 namenode 守护进程在尝试访问 datanode 时与之对话的单个文件夹??

Consider I am installing hadoop in pseudo-distributed mode. What happens under the hood during this installation? I added a tmp.dir parameter in configuration files. Is is the single folder to which namenode daemon talks to, when it attemps to access the datanode??

推荐答案

好吧..让我试一试..当您配置 Hadoop 时,它会在您的本地 FS(即 HDFS)之上放置一个虚拟 FS.HDFS 以复制的方式将数据存储为块(类似于本地 FS,但比它大得多).但是 HDFS 目录树或文件系统命名空间与本地 FS 的相同.当您开始将数据写入 HDFS 时,它最终只会写入本地 FS,但您无法在那里直接看到它.

OK..let me give it a try..When you configure Hadoop it lays down a virtual FS on top of your local FS, which is the HDFS. HDFS stores data as blocks(similar to the local FS, but much much bigger as compared to it) in a replicated fashion. But the HDFS directory tree or the filesystem namespace is identical to that of local FS. When you start writing data into HDFS, it eventually gets written onto the local FS only, but you can't see it there directly.

临时目录实际上有 3 个目的:

The temp directory actually serves 3 purposes :

1- namenode 存储元数据的目录,默认值为 ${hadoop.tmp.dir}/dfs/name 并且可以通过 dfs.name.dir 明确指定代码>.如果您指定 dfs.name.dir,则 namenode 元数据将存储在作为此属性值给出的目录中.

1- Directory where namenode stores its metadata, with default value ${hadoop.tmp.dir}/dfs/name and can be specified explicitly by dfs.name.dir. If you specify dfs.name.dir, then the namenode metedata will be stored in the directory given as the value of this property.

2- HDFS 数据块的存储目录,默认值为 ${hadoop.tmp.dir}/dfs/data 并且可以通过 dfs.data.dir<明确指定/代码>.如果您指定 dfs.data.dir,则 HDFS 数据将存储在作为该属性值给出的目录中.

2- Directory where HDFS data blocks are stored, with default value ${hadoop.tmp.dir}/dfs/data and can be specified explicitly by dfs.data.dir. If you specify dfs.data.dir, then the HDFS data will be stored in the directory given as the value of this property.

3- 二级名称节点存储其检查点的目录,默认值为 ${hadoop.tmp.dir}/dfs/namesecondary 并且可以通过 fs.checkpoint.dir<明确指定/代码>.

3- Directory where secondary namenode store its checkpoints, default value is ${hadoop.tmp.dir}/dfs/namesecondary and can be specified explicitly by fs.checkpoint.dir.

因此,为了更简洁的设置,最好使用一些适当的专用位置作为这些属性的值.

So, it's always better to use some proper dedicated location as the values for these properties for a cleaner setup.

当需要访问特定数据块时,会搜索存储在 dfs.name.dir 目录中的元数据,并将该块在特定数据节点上的位置返回给客户端(位于 dfs.data.dir 中的某处)本地 FS 上的目录).然后客户端直接从那里读取数据(同样适用于写入).

When access to a particular block of data is required metadata stored in the dfs.name.dir directory is searched and the location of that block on a particular datanode is returned to the client(which is somewhere in dfs.data.dir directory on the local FS). The client then reads data directly from there (same holds good for writes as well).

这里需要注意的一个重点是 HDFS 不是物理 FS.它是本地 FS 之上的虚拟抽象,不能像本地 FS 那样简单地浏览.您需要使用 HDFS shell 或 HDFS WebUI 或可用的 API 来执行此操作.

One important point to note here is that HDFS is not a physical FS. It is rather a virtual abstraction on top of your local FS which can't be browsed simply like the local FS. You need to use the HDFS shell or the HDFS webUI or the available APIs to do that.

HTH

这篇关于HDFS 与 LFS - 如何 Hadoop Dist.文件系统是建立在本地文件系统之上的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆