数据块大小在HDFS中,为什么64MB? [英] data block size in HDFS, why 64MB?

查看:318
本文介绍了数据块大小在HDFS中,为什么64MB?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

HDFS / hadoop的默认数据块大小为64MB。
磁盘块大小通常为4KB。
64MB块大小是什么意思? - >这是否意味着从磁盘读取的最小单位是64MB

The default data block size of HDFS/hadoop is 64MB. The block size in disk is generally 4KB. What does 64MB block size mean? ->Does it mean that the smallest unit of read from disk is 64MB?

如果有,是什么这样做的好处? - >容易的大型连续访问在HDFS文件?

If yes, what is the advantage of doing that?-> easy for continuous access of large file in HDFS?

我们可以通过在磁盘使用原来的4KB块大小做​​?

Can we do the same by using the original 4KB block size in disk?

推荐答案

What does 64MB block size mean?

块大小是文件系统可以存储的最小数据单位。如果您存储1k或60Mb的文件,它将占用一个块。一旦你跨越64Mb边界,你需要第二个块。

The block size is the smallest unit of data that a file system can store. If you store a file that's 1k or 60Mb, it'll take up one block. Once you cross the 64Mb boundry, you need a second block.

If yes, what is the advantage of doing that?

HDFS用于处理大文件。假设你有一个1000Mb文件。使用4k块大小,您必须获得256,000个请求才能获取该文件(每个块1个请求)。在HDFS中,这些请求通过网络并且带来很多开销。每个请求必须由名称节点处理,以确定可以在哪里找到该块。这是很多交通!如果使用64Mb块,请求数降至16,这大大降低了名称节点上的开销和加载成本。

HDFS is meant to handle large files. Lets say you have a 1000Mb file. With a 4k block size, you'd have to make 256,000 requests to get that file (1 request per block). In HDFS, those requests go across a network and come with a lot of overhead. Each request has to be processed by the Name Node to figure out where that block can be found. That's a lot of traffic! If you use 64Mb blocks, the number of requests goes down to 16, greatly reducing the cost of overhead and load on the Name Node.

这篇关于数据块大小在HDFS中,为什么64MB?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆