如何计算hdfs命令文件中的行数? [英] How to count lines in a file on hdfs command?
问题描述
我在HDFS上有一个文件,我想知道有多少行。 (testfile)
在linux中,我可以这样做:
wc -l<文件名>
我可以使用hadoop fs命令做类似的事吗?我可以打印文件内容:
hadoop fs -text / user / mklein / testfile
我怎么知道我有多少行?我想避免将文件复制到本地文件系统,然后运行wc命令。
注意:我的文件使用快速压缩进行压缩,这就是为什么我必须使用-text而不是-cat
hadoop fs
命令。要么你必须使用本文章中所介绍的逻辑编写mapreduce代码$>或者这个猪脚本会有帮助。$ b $ pre $
A = LOAD'file'使用PigStorage()作为(...) ;
B = A组所有;
cnt = foreach B生成COUNT(A);
确保您的动态文件具有正确的扩展名,以便猪可以检测并读取它。
I have a file on HDFS that I want to know how many lines are. (testfile)
In linux, I can do:
wc -l <filename>
Can I do something similar with "hadoop fs" command? I can print file contents with:
hadoop fs -text /user/mklein/testfile
How do I know how many lines do I have? I want to avoid copying the file to local filesystem then running the wc command.
Note: My file is compressed using snappy compression, which is why I have to use -text instead of -cat
You cannot do it with a hadoop fs
command. Either you have to write a mapreduce code with the logic explained in this post or this pig script would help.
A = LOAD 'file' using PigStorage() as(...);
B = group A all;
cnt = foreach B generate COUNT(A);
Makesure you have the correct extension for your snappy file so that pig could detect and read it.
这篇关于如何计算hdfs命令文件中的行数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!