为什么没有'hadoop fs -head' shell 命令? [英] Why is there no 'hadoop fs -head' shell command?

查看:26
本文介绍了为什么没有'hadoop fs -head' shell 命令?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

检查 HDFS 上文件的一种快速方法是使用 tail:

A fast method for inspecting files on HDFS is to use tail:

~$ hadoop fs -tail /path/to/file

这会显示文件中最后千字节的数据,这非常有用.但是,相反的命令 head 似乎不是 shell 命令集合的一部分.我觉得这非常令人惊讶.

This displays the last kilobyte of data in the file, which is extremely helpful. However, the opposite command head does not appear to be part of the shell command collections. I find this very surprising.

我的假设是,由于 HDFS 是为对非常大的文件进行非常快速的流式读取而构建的,因此存在一些影响head 的面向访问的问题.这让我犹豫要不要做一些访问头部的事情.有人有答案吗?

My hypothesis is that since HDFS is built for very fast streaming reads on very large files, there is some access-oriented issue that affects head. This makes me hesitant to do things to access the head. Does anyone have an answer?

推荐答案

我会说这更多地与效率有关 - 通过 linux head 命令通过管道传输 hadoop fs -cat 的输出可以轻松复制 head.

I would say it's more to do with efficiency - a head can easily be replicated by piping the output of a hadoop fs -cat through the linux head command.

hadoop fs -cat /path/to/file | head

这是有效的,因为 head 会在输出所需的行数后关闭底层流

This is efficient as head will close out the underlying stream after the desired number of lines have been output

以这种方式使用 tail 的效率会大大降低 - 因为您必须流式传输整个文件(所有 HDFS 块)才能找到最终的 x 行数.

Using tail in this manner would be considerably less efficient - as you'd have to stream over the entire file (all HDFS blocks) to find the final x number of lines.

hadoop fs -cat /path/to/file | tail

您注意到的 hadoop fs -tail 命令适用于最后一个 KB - hadoop 可以有效地找到最后一个块并跳到最后一个 KB 的位置,然后流式传输输出.通过尾部管道不能轻易做到这一点.

The hadoop fs -tail command as you note works on the last kilobyte - hadoop can efficiently find the last block and skip to the position of the final kilobyte, then stream the output. Piping via tail can't easily do this.

这篇关于为什么没有'hadoop fs -head' shell 命令?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆