HDFS文件系统-如何获取目录中特定文件的字节数 [英] HDFS File System - How to get the byte count for a specific file in a directory
问题描述
我正在尝试获取HDFS目录中特定文件的字节数.
I am trying to get the byte count for the specific file in a HDFS directory.
我尝试使用 fs.getFileStatus()
,但是我看不到任何获取文件字节数的方法,我只能看到 getBlockSize()
方法
I tried to use fs.getFileStatus()
,but i do not see any methods for getting byte count of the file, i can see only getBlockSize()
method.
有什么方法可以获取HDFS中特定文件的字节数吗?
Is there any way can i get the byte count of a specific file in HDFS?
推荐答案
fs.getFileStatus()
返回具有方法 getLen()
的FileStatus对象,该方法将返回此文件的长度,以字节为单位."也许您应该对此进行仔细研究: https://hadoop.apache.org/docs/r2.6.1/api/org/apache/hadoop/fs/FileStatus.html .
fs.getFileStatus()
returns a FileStatus objects which has a method getLen()
this will return "length of this file, in bytes." Maybe you should haev a closer look on this: https://hadoop.apache.org/docs/r2.6.1/api/org/apache/hadoop/fs/FileStatus.html.
但是请注意,文件大小在HDFS上并不那么重要.文件将按照所谓的数据块进行组织,每个数据块默认为64 MB.因此,如果处理许多小文件(这是HDFS上的一大反模式),则容量可能会比您预期的要少.有关更多详细信息,请参见此链接:
BUT be aware that the file size is not that important on HDFS. The files will be organized in so called data-blocks each datablock is by default 64 MB. So if you deal with many small files (which is one big anti-pattern on HDFS) you may have less capacity than you expect. See this link for more details:
https://hadoop.apache.org/docs/r2.6.1/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#Data_Blocks
这篇关于HDFS文件系统-如何获取目录中特定文件的字节数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!