如何计算hdfs命令文件中的行数？ [英] How to count lines in a file on hdfs command?

查看：2864 发布时间：2018/5/31 18:42:47 hadoop

本文介绍了如何计算hdfs命令文件中的行数？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在HDFS上有一个文件，我想知道有多少行。（testfile）

在linux中，我可以这样做：

  wc -l<文件名>

我可以使用hadoop fs命令做类似的事吗？我可以打印文件内容：

  hadoop fs -text / user / mklein / testfile

我怎么知道我有多少行？我想避免将文件复制到本地文件系统，然后运行wc命令。

注意：我的文件使用快速压缩进行压缩，这就是为什么我必须使用-text而不是-cat

解决方案您无法通过 hadoop fs 命令。要么你必须使用本文章中所介绍的逻辑编写mapreduce代码或者这个猪脚本会有帮助。
$ b $ pre $ A = LOAD'file'使用PigStorage（）作为（...） ; B = A组所有; cnt = foreach B生成COUNT（A）;
确保您的动态文件具有正确的扩展名，以便猪可以检测并读取它。

I have a file on HDFS that I want to know how many lines are. (testfile)

In linux, I can do:
wc -l <filename>
Can I do something similar with "hadoop fs" command? I can print file contents with:
hadoop fs -text /user/mklein/testfile
How do I know how many lines do I have? I want to avoid copying the file to local filesystem then running the wc command.

Note: My file is compressed using snappy compression, which is why I have to use -text instead of -cat
解决方案
You cannot do it with a hadoop fs command. Either you have to write a mapreduce code with the logic explained in this post or this pig script would help.
A = LOAD 'file' using PigStorage() as(...); B = group A all; cnt = foreach B generate COUNT(A);
Makesure you have the correct extension for your snappy file so that pig could detect and read it.

这篇关于如何计算hdfs命令文件中的行数？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何计算hdfs命令文件中的行数？ [英] How to count lines in a file on hdfs command?

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

如何计算hdfs命令文件中的行数？ [英] How to count lines in a file on hdfs command?

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭