Hadoop& Bash:删除匹配范围的文件名 [英] Hadoop & Bash: delete filenames matching range
问题描述
part-1.gz,part-2.gz,part-3.gz,...,part -50.gz
我只想在目录中留下几个文件,比如说3.任何三个文件会做。这些文件将用于测试,因此文件的选择无关紧要。
什么是简单&最快的方法来删除其他47个文件?
这里有几个选项:
手动移动三个文件到一个新文件夹中,然后删除旧文件夹。
使用 fs -ls
抓取文件名,然后拉上n,然后rm。这是最稳健的方法,在我看来。
hadoop fs -ls / path / to / files
给出你ls输出
hadoop fs -ls / path / to / files | grep'part'| awk'{print $ 8}'
仅打印文件名(相应地调整grep以获取所需的文件)。
hadoop fs -ls / path / to / files | grep'part'| awk'{print $ 8}'| head -n47
抓住前47名
把这个放到for循环里,然后把它们放到里面:
for``in`hadoop fs -ls / path / to / files | grep部分| awk'{print $ 8}'| head -n47`
do
hadoop fs -rm $ k
done
您可以使用 xargs
来代替for循环:
hadoop fs -ls / path / to / files | grep部分| awk'{print $ 8}'|头-n47 | xargs hadoop fs -rm
感谢Keith的灵感
Say you have a list of files in HDFS with a common prefix and an incrementing suffix. For example,
part-1.gz, part-2.gz, part-3.gz, ..., part-50.gz
I only want to leave a few file in the directory, say 3. Any three files will do. The files will be used for testing so the choice of files doesn't matter.
What's the simples & fastest way to delete the 47 other files?
Few options here:
Move three files manually over to a new folder, then delete the old folder.
Grab the files names with fs -ls
, then pull the top n, then rm them. This is the most robust method, in my opinion.
hadoop fs -ls /path/to/files
gives you ls output
hadoop fs -ls /path/to/files | grep 'part' | awk '{print $8}'
prints out only the file names (adjust the grep accordingly to grab the files you want).
hadoop fs -ls /path/to/files | grep 'part' | awk '{print $8}' | head -n47
grabs the top 47
Throw this into a for loop and rm them:
for k in `hadoop fs -ls /path/to/files | grep part | awk '{print $8}' | head -n47`
do
hadoop fs -rm $k
done
Instead of a for-loop, you could use xargs
:
hadoop fs -ls /path/to/files | grep part | awk '{print $8}' | head -n47 | xargs hadoop fs -rm
Thanks to Keith for the inspiration
这篇关于Hadoop& Bash:删除匹配范围的文件名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!