如何在Awk中使用单个正则表达式提取多个字符串 [英] How to extract multiple strings with single regex expression in Awk
问题描述
我有以下字符串:
Mike has XXX cats and XXXXX dogs.
MikehasXXXcatsandXXXXXdogs
我想用与Xs数量相对应的数字替换Xs:
I would like to replace Xs with the digits corresponding to the number of Xs:
我尝试过:
awk '{ match($0, /[X]+/);
a = length(substr($0, RSTART, RLENGTH));
gsub(/[X]+/, a) }1'
但是它只捕获第一个匹配项.
But it captures only the first match.
预期输出:
Mike has 3 cats and 5 dogs.
Mikehas3catsand5dogs
推荐答案
使用显示的示例,请尝试以下.在GNU awk
中编写和测试(应该在任何 awk
中都可以使用).
With your shown samples, could you please try following. Written and tested in GNU awk
(should work in any awk
).
awk '{for(i=1;i<=NF;i++){if($i~/^X+$/){$i=gsub(/X/,"&",$i)}}} 1' Input_file
示例输出将是:
Mike has 3 cats and 5 dogs.
说明: 遍历所有字段(以空格分隔),检查字段是否从 X
开始并且只有 X
直到当前字段的末尾,如果是,则将其全局替换为其自己的值(以获取计数)并保存到当前字段本身中.然后提及1将显示当前行.
Explanation: Going through all the fields(space delimited) and checking if field starts from X
and has only X
till end of current field, if yes then globally substituting it with its own value(to get the count) and saving into current field itself. Then mentioning 1 will print current line.
注意: 根据Ed先生的评论(在问题"部分下),如果您的字段也可能具有其他 X
值,请尝试(这甚至还会覆盖任何列中的 XXX456
值):
NOTE: As per Ed sir's comment(under question section), in case your fields may have values other X
too then try(this will even cover XXX456
value in any column too):
awk '{for(i=1;i<=NF;i++){if($i~/X/){$i=gsub(/X/,"&",$i)}}} 1' Input_file
由于OP的示例已更改,因此请在此处添加此解决方案,并使用GNU awk
进行编写和测试.
awk -v RS='X+' '{ORS=(RT ? gsub(/./,"",RT) : "")} 1' Input_file
OR
awk -v RS='X+' '{ORS=(RT ? length(RT) : "")} 1' Input_file
以上代码的输出如下:
Mike has 3 cats and 5 dogs.
Mikehas3catsand5dogs
这篇关于如何在Awk中使用单个正则表达式提取多个字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!