多个文件是否存储在一个块中? [英] Are multiple files stored in a single block?

查看:158
本文介绍了多个文件是否存储在一个块中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



在我看来,这些小文件应该被存储到一个单独的块中。

当我将很多小文件存储到HDFS时,它们会被存储在一个块中吗?根据这个讨论:
HDFS块大小Vs实际文件大小

解决方案

引用来自 Hadoop - 权威指南


HDFS存储小文件的效率很低,因为每个文件存储在一个块中,块
元数据由namenode存储在内存中。因此,大量小文件
可能会占用namenode上的大量内存。 (但是,请注意,小文件不会占用比存储文件原始内容所需的更多磁盘空间的
。例如,对于
例如1 MB文件存储的块大小为128 MB使用1 MB的磁盘空间,而不是
128 MB。)
Hadoop存档或HAR文件是文件存档工具,可将文件打包到HDFS
块更加高效,从而减少namenode内存使用量,同时仍允许
透明地访问文件。


结论: strong>每个文件将存储在一个单独的块中。


When I store many small files into HDFS, will they get stored in a single block?

In my opinion, these small files should get stored into a single block according to this discussion: HDFS block size Vs actual file size

解决方案

Quoting from Hadoop - The Definitive Guide:

HDFS stores small files inefficiently, since each file is stored in a block, and block metadata is held in memory by the namenode. Thus, a large number of small files can eat up a lot of memory on the namenode. (Note, however, that small files do not take up any more disk space than is required to store the raw contents of the file. For example, a 1 MB file stored with a block size of 128 MB uses 1 MB of disk space, not 128 MB.) Hadoop Archives, or HAR files, are a file archiving facility that packs files into HDFS blocks more efficiently, thereby reducing namenode memory usage while still allowing transparent access to files.

Conclusion: Each file will get stored in a separate block.

这篇关于多个文件是否存储在一个块中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆