HDFS中的大块大小!未使用的空间如何计算? [英] Large Block Size in HDFS! How is the unused space accounted for?
问题描述
我正在阅读本书 - 权威指南Hadoop,并发现在某处写入的文件比HDFS的块大小小的文件不占用完整块,不占整个块的空间,但不知道如何?有人可以提出一些解释。
HFDS中的块分区只是逻辑地构建在物理块底层文件系统(例如ext3 / fat)。文件系统在物理上没有分成块(比如64MB或128MB(或者可能是块大小))。这只是将元数据存储在NameNode中的抽象。由于Namenode必须将全部元数据加载到内存中,因此元数据条目的数量是有限的,因此需要大块数据。存储在HDFS上的文件逻辑上占用3个块(namenode中有3个元数据条目),但物理地占用底层文件系统中的8 * 3 = 24MB空间。在考虑namenode内存限制的情况下考虑正确使用存储空间。
We all know that the block size in HDFS is pretty large (64M or 128M) as compared to the block size in traditional file systems. This is done in order to reduce the percentage of seek time compared to the transfer time (Improvements in transfer rate have been on a much larger scale than improvements on the disk seek time therefore, the goal while designing a file system is always to reduce the number of seeks in comparison to the amount of data to be transferred). But this comes with an additional disadvantage of internal fragmentation (which is why traditional file system block sizes are not so high and are only of the order of a few KBs - generally 4K or 8K).
I was going through the book - Hadoop, the Definitive Guide and found this written somewhere that a file smaller than the block size of HDFS does not occupy the full block and does not account for the full block's space but couldn't understand how? Can somebody please throw some light on this.
@Abhishek : The block division in HFDS is just logically built over the physical blocks of underlying filesystem (e.g. ext3/fat). The filesystem is not physically divided into blocks(say of 64MB or 128MB(or whatever may be the block size)). It's just an abstraction to store the metadata in the NameNode. Since the Namenode has to load the entire metadata in memory therefore there is a limit to number of metadata entries thus explaining the need for a large block size.
Therefore, three 8MB files stored on HDFS logically occupies 3 blocks(3 metadata entries in namenode) but physically occupies 8*3=24MB space in the underlying filesystem.
The large block size is to account for proper usage of storage space while considering the limit on the memory of namenode.
这篇关于HDFS中的大块大小!未使用的空间如何计算?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!