如何仅列出HDFS中的文件名 [英] How to list only the file names in HDFS

查看:618
本文介绍了如何仅列出HDFS中的文件名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道是否有任何命令/表达式只能获取hadoop中的文件名.我只需要提取文件名,当我执行hadoop fs -ls时,它将打印整个路径.

I would like to know is there any command/expression to get only the file name in hadoop. I need to fetch only the name of file, when I do hadoop fs -ls it prints the whole path.

我在下面尝试过,但是只是想知道是否有更好的方法.

I tried below but just wondering if some better way to do it.

hadoop fs -ls <HDFS_DIR>|cut -d ' ' -f17 

推荐答案

似乎

It seems hadoop ls does not support any options to output just the filenames, or even just the last column.

如果要可靠地获取最后一列,则应首先将空格转换为单个空格,以便随后可以寻址最后一列:

If you want get the last column reliably, you should first convert the whitespace to a single space, so that you can then address the last column:

hadoop fs -ls | sed '1d;s/  */ /g' | cut -d\  -f8

这将使您仅获得最后一列,但具有完整路径的文件.如果只需要文件名,则可以按@rojomoke的建议使用基本名称:

This will get you just the last column but files with the whole path. If you want just filenames, you can use basename as @rojomoke suggests:

hadoop fs -ls | sed '1d;s/  */ /g' | cut -d\  -f8 | xargs -n 1 basename

我还过滤掉了第一行说Found ?x items

I also filtered out the first line that says Found ?x items

注意:请注意,如注释中的@ felix-frank所述,上述命令将不能正确保留多个连续空格的文件名.因此,Felix提出了一个更正确的解决方案:

Note: beware that, as @felix-frank notes in the comments, that the above command will not correctly preserve file names with multiple consecutive spaces. Hence a more correct solution proposed by Felix:

hadoop fs -ls /tmp | sed 1d | perl -wlne'print +(split " ",$_,8)[7]'

这篇关于如何仅列出HDFS中的文件名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆