数据块大小在HDFS中,为什么64MB? [英] data block size in HDFS, why 64MB?
问题描述
HDFS / hadoop的默认数据块大小为64MB。
磁盘块大小通常为4KB。
64MB块大小是什么意思? - >这是否意味着从磁盘读取的最小单位是64MB
The default data block size of HDFS/hadoop is 64MB. The block size in disk is generally 4KB. What does 64MB block size mean? ->Does it mean that the smallest unit of read from disk is 64MB?
如果有,是什么这样做的好处? - >容易的大型连续访问在HDFS文件?
If yes, what is the advantage of doing that?-> easy for continuous access of large file in HDFS?
我们可以通过在磁盘使用原来的4KB块大小做?
Can we do the same by using the original 4KB block size in disk?
推荐答案
What does 64MB block size mean?
块大小是文件系统可以存储的最小数据单位。如果您存储1k或60Mb的文件,它将占用一个块。一旦你跨越64Mb边界,你需要第二个块。
The block size is the smallest unit of data that a file system can store. If you store a file that's 1k or 60Mb, it'll take up one block. Once you cross the 64Mb boundry, you need a second block.
If yes, what is the advantage of doing that?
HDFS用于处理大文件。假设你有一个1000Mb文件。使用4k块大小,您必须获得256,000个请求才能获取该文件(每个块1个请求)。在HDFS中,这些请求通过网络并且带来很多开销。每个请求必须由名称节点处理,以确定可以在哪里找到该块。这是很多交通!如果使用64Mb块,请求数降至16,这大大降低了名称节点上的开销和加载成本。
HDFS is meant to handle large files. Lets say you have a 1000Mb file. With a 4k block size, you'd have to make 256,000 requests to get that file (1 request per block). In HDFS, those requests go across a network and come with a lot of overhead. Each request has to be processed by the Name Node to figure out where that block can be found. That's a lot of traffic! If you use 64Mb blocks, the number of requests goes down to 16, greatly reducing the cost of overhead and load on the Name Node.
这篇关于数据块大小在HDFS中,为什么64MB?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!