具有正则表达式的Hadoop fs -rm [英] Hadoop fs -rm with regular expression
问题描述
我有一个带有9k分区的表格,其中我想删除大约1200(表示3天)
我想将 hadoop fs -rm
和这3天的正则表达式,类似于 pr_load_time = 2017070([1-4])(\ d +)
。
分区看起来像这样(我只想匹配前两个)
pr_load_time = 20170701000317
pr_load_time = 20170704133602
pr_load_time = 20170705000317
pr_load_time = 20170706133602
有这样的可能吗?我正在考虑用awk来匹配分区并使用xargs,但是这似乎是删除如此大量文件的非常缓慢的方法。
我想上面的评论会解决你的问题,但你可以尝试下面的情况下
/ hdfs路径/ pr_load_time = {20170701000317,20170704133602,20170705000317,..}
或类似的东西
/ hdfs路径/ pr_load_time = 201707 {01000317,04133602,05000317,..}
这可以在单个命令中结合不同的模式
/ hdfs path / pr_load_time = 201707 {01 *,04 *, 05 *,..}
I have a table with 9k partitions, of which I would like to delete about 1200 (which represents 3 days)
I would like to combine the hadoop fs -rm
and regular expressions for these 3 days, something like pr_load_time=2017070([1-4])(\d+)
.
The partitions look like this (I want to match only the first two here)
pr_load_time=20170701000317
pr_load_time=20170704133602
pr_load_time=20170705000317
pr_load_time=20170706133602
Is something like this possible? I was thinking about matching the partitions with awk and use xargs, but this seems to be really slow approach to delete such a big number of files.
I guess above comment would solve your problem however you could try below in case
/hdfs path/pr_load_time={20170701000317,20170704133602,20170705000317,..}
or something like this
/hdfs path/pr_load_time=201707{01000317,04133602,05000317,..}
this can combine different pattern in single command
/hdfs path/pr_load_time=201707{01*,04*,05*,..}
这篇关于具有正则表达式的Hadoop fs -rm的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!