如何在Awk中使用单个正则表达式提取多个字符串 [英] How to extract multiple strings with single regex expression in Awk

查看:79
本文介绍了如何在Awk中使用单个正则表达式提取多个字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下字符串:

Mike has XXX cats and XXXXX dogs.
MikehasXXXcatsandXXXXXdogs

我想用与Xs数量相对应的数字替换Xs:

I would like to replace Xs with the digits corresponding to the number of Xs:

我尝试过:

awk '{ match($0, /[X]+/);
  a = length(substr($0, RSTART, RLENGTH));
  gsub(/[X]+/, a) }1'

但是它只捕获第一个匹配项.

But it captures only the first match.

预期输出:

Mike has 3 cats and 5 dogs.
Mikehas3catsand5dogs

推荐答案

使用显示的示例,请尝试以下.在GNU awk 中编写和测试(应该在任何 awk 中都可以使用).

With your shown samples, could you please try following. Written and tested in GNU awk(should work in any awk).

awk '{for(i=1;i<=NF;i++){if($i~/^X+$/){$i=gsub(/X/,"&",$i)}}} 1'  Input_file

示例输出将是:

Mike has 3 cats and 5 dogs.

说明: 遍历所有字段(以空格分隔),检查字段是否从 X 开始并且只有 X 直到当前字段的末尾,如果是,则将其全局替换为其自己的值(以获取计数)并保存到当前字段本身中.然后提及1将显示当前行.

Explanation: Going through all the fields(space delimited) and checking if field starts from X and has only X till end of current field, if yes then globally substituting it with its own value(to get the count) and saving into current field itself. Then mentioning 1 will print current line.

注意: 根据Ed先生的评论(在问题"部分下),如果您的字段也可能具有其他 X 值,请尝试(这甚至还会覆盖任何列中的 XXX456 值):

NOTE: As per Ed sir's comment(under question section), in case your fields may have values other X too then try(this will even cover XXX456 value in any column too):

awk '{for(i=1;i<=NF;i++){if($i~/X/){$i=gsub(/X/,"&",$i)}}} 1'  Input_file



由于OP的示例已更改,因此请在此处添加此解决方案,并使用GNU awk 进行编写和测试.

awk -v RS='X+' '{ORS=(RT ? gsub(/./,"",RT) : "")} 1' Input_file

OR

awk -v RS='X+' '{ORS=(RT ? length(RT) : "")} 1' Input_file

以上代码的输出如下:

Mike has 3 cats and 5 dogs.
Mikehas3catsand5dogs

这篇关于如何在Awk中使用单个正则表达式提取多个字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆