Hadoop& Bash：删除匹配范围的文件名 [英] Hadoop & Bash: delete filenames matching range

查看：478 发布时间：2018/5/31 19:20:25 bash hadoop

本文介绍了Hadoop& Bash：删除匹配范围的文件名的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设你有一个HDFS中的文件列表，它有一个公共前缀和一个递增后缀。例如，

part-1.gz，part-2.gz，part-3.gz，...，part -50.gz
我只想在目录中留下几个文件，比如说3.任何三个文件会做。这些文件将用于测试，因此文件的选择无关紧要。

什么是简单&最快的方法来删除其他47个文件？
解决方案
这里有几个选项：

手动移动三个文件到一个新文件夹中，然后删除旧文件夹。

使用 fs -ls 抓取文件名，然后拉上n，然后rm。这是最稳健的方法，在我看来。

hadoop fs -ls / path / to / files 给出你ls输出

hadoop fs -ls / path / to / files | grep'part'| awk'{print $ 8}'仅打印文件名（相应地调整grep以获取所需的文件）。

hadoop fs -ls / path / to / files | grep'part'| awk'{print $ 8}'| head -n47 抓住前47名

把这个放到for循环里，然后把它们放到里面：

for``in`hadoop fs -ls / path / to / files | grep部分| awk'{print $ 8}'| head -n47` do hadoop fs -rm $ k done
您可以使用 xargs 来代替for循环：

hadoop fs -ls / path / to / files | grep部分| awk'{print $ 8}'|头-n47 | xargs hadoop fs -rm
感谢Keith的灵感
Say you have a list of files in HDFS with a common prefix and an incrementing suffix. For example,
part-1.gz, part-2.gz, part-3.gz, ..., part-50.gz
I only want to leave a few file in the directory, say 3. Any three files will do. The files will be used for testing so the choice of files doesn't matter.

What's the simples & fastest way to delete the 47 other files?
解决方案
Few options here:

Move three files manually over to a new folder, then delete the old folder.

Grab the files names with fs -ls, then pull the top n, then rm them. This is the most robust method, in my opinion.

hadoop fs -ls /path/to/files gives you ls output

hadoop fs -ls /path/to/files | grep 'part' | awk '{print $8}' prints out only the file names (adjust the grep accordingly to grab the files you want).

hadoop fs -ls /path/to/files | grep 'part' | awk '{print $8}' | head -n47 grabs the top 47

Throw this into a for loop and rm them:
for k in `hadoop fs -ls /path/to/files | grep part | awk '{print $8}' | head -n47` do hadoop fs -rm $k done

Instead of a for-loop, you could use xargs:
hadoop fs -ls /path/to/files | grep part | awk '{print $8}' | head -n47 | xargs hadoop fs -rm
Thanks to Keith for the inspiration

这篇关于Hadoop& Bash：删除匹配范围的文件名的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Hadoop& Bash：删除匹配范围的文件名 [英] Hadoop & Bash: delete filenames matching range

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

Hadoop&amp; Bash：删除匹配范围的文件名 [英] Hadoop &amp; Bash: delete filenames matching range

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

Hadoop& Bash：删除匹配范围的文件名 [英] Hadoop & Bash: delete filenames matching range

登录关闭