HDFS中的数据块大小,为什么是64MB? [英] data block size in HDFS, why 64MB?

查看:45
本文介绍了HDFS中的数据块大小,为什么是64MB?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

HDFS/Hadoop 的默认数据块大小为 64MB.磁盘中的块大小一般为4KB.

The default data block size of HDFS/Hadoop is 64MB. The block size in the disk is generally 4KB.

64MB 块大小是什么意思?-> 是不是说从磁盘读取的最小单位是64MB?

What does 64MB block size mean? ->Does it mean that the smallest unit of reading from disk is 64MB?

如果是,这样做有什么好处?->HDFS中大文件的连续访问方便吗?

If yes, what is the advantage of doing that?-> easy for continuous access of large files in HDFS?

我们可以通过使用磁盘的原始 4KB 块大小来做同样的事情吗?

Can we do the same by using the disk's original 4KB block size?

推荐答案

64MB 块大小是什么意思?

What does 64MB block size mean?

块大小是文件系统可以存储的最小数据单元.如果您存储一个 1k 或 60Mb 的文件,它将占用一个块.一旦你越过 64Mb 的边界,你就需要第二个块.

The block size is the smallest data unit that a file system can store. If you store a file that's 1k or 60Mb, it'll take up one block. Once you cross the 64Mb boundary, you need a second block.

如果是,这样做有什么好处?

If yes, what is the advantage of doing that?

HDFS 旨在处理大文件.假设您有一个 1000Mb 的文件.对于 4k 块大小,您必须发出 256,000 个请求才能获取该文件(每个块 1 个请求).在 HDFS 中,这些请求通过网络传输并带来大量开销.每个请求都必须由名称节点处理以确定可以找到该块的位置.那是很大的流量!如果您使用 64Mb 块,请求数会减少到 16 个,从而显着降低 Name Node 上的开销和负载成本.

HDFS is meant to handle large files. Let's say you have a 1000Mb file. With a 4k block size, you'd have to make 256,000 requests to get that file (1 request per block). In HDFS, those requests go across a network and come with a lot of overhead. Each request has to be processed by the Name Node to determine where that block can be found. That's a lot of traffic! If you use 64Mb blocks, the number of requests goes down to 16, significantly reducing the cost of overhead and load on the Name Node.

这篇关于HDFS中的数据块大小,为什么是64MB?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆