HDFS如何计算可用块? [英] How HDFS calculate the available blocks?

查看:74
本文介绍了HDFS如何计算可用块?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假定块大小为128MB,则群集具有10GB(因此约有80个可用块).假设我创建了10个小文件,它们总共占用了128MB磁盘(块文件,校验和,复制...)和10个HDFS块.如果我想向HDFS添加另一个小文件,那么HDFS使用什么,使用的块或实际的磁盘使用量来计算可用块?

Assuming that the block size is 128MB, the cluster has 10GB (so ~80 available blocks). Suppose that I have created 10 small files which together take 128MB on disk (block files, checksums, replication...) and 10 HDFS blocks. If I want to add another small file to HDFS, then what does HDFS use, the used blocks or the actual disk usage, to calculate the available blocks?

80个块-10个块= 70个可用块或(10 GB-128 MB)/128 MB = 79个可用块?

80 blocks - 10 blocks = 70 available blocks or (10 GB - 128 MB)/128 MB = 79 available blocks?

谢谢.

推荐答案

块大小只是HDFS如何在整个群集中拆分和分配文件的一种指示-HDFS中没有物理保留的块数(您可以可以根据需要更改每个文件的块大小)

Block size is just an indication to HDFS how to split up and distribute the files across the cluster - there is not a physically reserved number of blocks in HDFS (you can change the block size for each individual file if you wish)

以您的示例为例,您还需要考虑复制因子和校验和文件,但是本质上添加很多小文件(小于块大小)并不意味着您浪费了可用块",因为它们占用了尽可能多的空间(再次需要记住,复制会增加存储文件所需的物理数据占用空间),并且可用块"的数量将更接近第二次计算.

For your example, you need to also take into consideration the replication factor and checksum files, but essentially adding lots of small files (less than the block size) does not mean that you have wasted 'available blocks' - they take up as much room as they need (again you need to remember that replication will increase the physical data footprint required to store the file) and the number of 'available blocks' will be closer to your second calculation.

最后一点-小文件很多,这意味着您的名称节点需要更多内存来跟踪它们(块大小,位置等),并且处理128x1MB文件的效率通常比单个128MB文件低(尽管这取决于您的处理方式)

A final note - having lots to small files means that your name node will require more memory to track them (blocks sizes, locations etc), and its generally less efficient to process 128x1MB files than single 128MB file (although that depends on how you're processing it)

这篇关于HDFS如何计算可用块?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆