具有正则表达式的Hadoop fs -rm [英] Hadoop fs -rm with regular expression

查看:596
本文介绍了具有正则表达式的Hadoop fs -rm的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有9k分区的表格,其中我想删除大约1200(表示3天)

我想将 hadoop fs -rm 和这3天的正则表达式,类似于 pr_load_time = 2017070([1-4])(\ d +)



分区看起来像这样(我只想匹配前两个)

  pr_load_time = 20170701000317 
pr_load_time = 20170704133602
pr_load_time = 20170705000317
pr_load_time = 20170706133602

有这样的可能吗?我正在考虑用awk来匹配分区并使用xargs,但是这似乎是删除如此大量文件的非常缓慢的方法。

解决方案 div>

我想上面的评论会解决你的问题,但你可以尝试下面的情况下

  / hdfs路径/ pr_load_time = {20170701000317,20170704133602,20170705000317,..} 

或类似的东西

  / hdfs路径/ pr_load_time = 201707 {01000317,04133602,05000317,..} 

这可以在单个命令中结合不同的模式

/ hdfs path / pr_load_time = 201707 {01 *,04 *, 05 *,..}


I have a table with 9k partitions, of which I would like to delete about 1200 (which represents 3 days)
I would like to combine the hadoop fs -rm and regular expressions for these 3 days, something like pr_load_time=2017070([1-4])(\d+) .

The partitions look like this (I want to match only the first two here)

pr_load_time=20170701000317
pr_load_time=20170704133602
pr_load_time=20170705000317
pr_load_time=20170706133602

Is something like this possible? I was thinking about matching the partitions with awk and use xargs, but this seems to be really slow approach to delete such a big number of files.

解决方案

I guess above comment would solve your problem however you could try below in case

/hdfs path/pr_load_time={20170701000317,20170704133602,20170705000317,..}

or something like this

/hdfs path/pr_load_time=201707{01000317,04133602,05000317,..}

this can combine different pattern in single command

/hdfs path/pr_load_time=201707{01*,04*,05*,..}

这篇关于具有正则表达式的Hadoop fs -rm的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆