具有正则表达式的Hadoop fs -rm [英] Hadoop fs -rm with regular expression

查看：596 发布时间：2018/5/31 19:47:01 regex hadoop hdfs rm

本文介绍了具有正则表达式的Hadoop fs -rm的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个带有9k分区的表格，其中我想删除大约1200（表示3天）

我想将 hadoop fs -rm 和这3天的正则表达式，类似于 pr_load_time = 2017070（[1-4]）（\ d +）。

分区看起来像这样（我只想匹配前两个）

  pr_load_time = 20170701000317 
 pr_load_time = 20170704133602 
 pr_load_time = 20170705000317 
 pr_load_time = 20170706133602

有这样的可能吗？我正在考虑用awk来匹配分区并使用xargs，但是这似乎是删除如此大量文件的非常缓慢的方法。

解决方案 div>

我想上面的评论会解决你的问题，但你可以尝试下面的情况下

  / hdfs路径/ pr_load_time = {20170701000317,20170704133602,20170705000317，..}

或类似的东西

  / hdfs路径/ pr_load_time = 201707 {01000317,04133602,05000317，..}

这可以在单个命令中结合不同的模式

/ hdfs path / pr_load_time = 201707 {01 *，04 *， 05 *，..}

I have a table with 9k partitions, of which I would like to delete about 1200 (which represents 3 days)
I would like to combine the hadoop fs -rm and regular expressions for these 3 days, something like pr_load_time=2017070([1-4])(\d+) .

The partitions look like this (I want to match only the first two here)

pr_load_time=20170701000317
pr_load_time=20170704133602
pr_load_time=20170705000317
pr_load_time=20170706133602

Is something like this possible? I was thinking about matching the partitions with awk and use xargs, but this seems to be really slow approach to delete such a big number of files.

解决方案

I guess above comment would solve your problem however you could try below in case

/hdfs path/pr_load_time={20170701000317,20170704133602,20170705000317,..}

or something like this

/hdfs path/pr_load_time=201707{01000317,04133602,05000317,..}

this can combine different pattern in single command

/hdfs path/pr_load_time=201707{01*,04*,05*,..}

这篇关于具有正则表达式的Hadoop fs -rm的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

具有正则表达式的Hadoop fs -rm [英] Hadoop fs -rm with regular expression

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

具有正则表达式的Hadoop fs -rm [英] Hadoop fs -rm with regular expression

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭