小文件和HDFS块 [英] Small files and HDFS blocks

查看:94
本文介绍了小文件和HDFS块的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Hadoop分布式文件系统中的块是否存储多个小文件,或块是否只存储1个文件?

文件不存储在一个单独的块中。顺便说一句,单个文件可以存储在多个块中。文件和block-id之间的映射在NameNode中保存。



根据 Hadoop:权威指南
$ b


与单个磁盘的文件系统不同, HDFS中小于单个块的文件不占用整块的底层存储。


HDFS旨在处理大文件。如果有太多的小文件,那么NameNode可能会被加载,因为它存储了HDFS的名称空间。查看此文章,了解如何缓解问题。太多的小文件。

Does a block in Hadoop Distributed File System store multiple small files, or a block stores only 1 file?

解决方案

Multiple files are not stored in a single block. BTW, a single file can be stored in multiple blocks. The mapping between the file and the block-ids is persisted in the NameNode.

According to the Hadoop : The Definitive Guide

Unlike a filesystem for a single disk, a file in HDFS that is smaller than a single block does not occupy a full block’s worth of underlying storage.

HDFS is designed to handle large files. If there are too many small files then the NameNode might get loaded since it stores the name space for HDFS. Check this article on how to alleviate the problem with too many small files.

这篇关于小文件和HDFS块的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆