小文件和 HDFS 块 [英] Small files and HDFS blocks

查看:27
本文介绍了小文件和 HDFS 块的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Hadoop 分布式文件系统中的一个块是存储多个小文件,还是一个块只存储 1 个文件?

Does a block in Hadoop Distributed File System store multiple small files, or a block stores only 1 file?

推荐答案

多个文件未存储在单个块中.顺便说一句,单个文件可以存储在多个块中.文件和块 ID 之间的映射保存在 NameNode 中.

Multiple files are not stored in a single block. BTW, a single file can be stored in multiple blocks. The mapping between the file and the block-ids is persisted in the NameNode.

根据 Hadoop:权威指南

与单个磁盘的文件系统不同,HDFS 中小于单个块的文件不会占用一个完整块的底层存储.

Unlike a filesystem for a single disk, a file in HDFS that is smaller than a single block does not occupy a full block’s worth of underlying storage.

HDFS 旨在处理大文件.如果有太多小文件,那么 NameNode 可能会被加载,因为它存储了 HDFS 的命名空间.查看这篇文章,了解如何通过小文件太多.

HDFS is designed to handle large files. If there are too many small files then the NameNode might get loaded since it stores the name space for HDFS. Check this article on how to alleviate the problem with too many small files.

这篇关于小文件和 HDFS 块的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆