SED:同一行上有多个模式,如何匹配/解析第一个 [英] SED: multiple patterns on the same line, how to match/parse first one

查看:763
本文介绍了SED:同一行上有多个模式,如何匹配/解析第一个的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文件,其中保存电话号码数据以及一些无用的东西. 我正在尝试解析数字,当只有1个电话号码/线路时,这没问题. 但是,当我有多个数字时,sed会匹配最后一个数字(即使到处都说它应该只匹配第一个模式?),而我无法获取其他数字.

I have a file, which holds phone number data, and also some useless stuff. I'm trying to parse the numbers out, and when there is only 1 phone number / line, it's not problem. But when I have multiple numbers, sed matches the last one (even though everywhere it says it should match only match the first pattern?), and I can't get other numbers out..

我的data.txt:

My data.txt:

bla bla bla NUM:09011111111 bla bla bla bla NUM:08022222222 bla bla bla

当我解析数据时,我的想法是首先删除第一个电话号码前面的所有初始""bla bla bla"(因此我搜索首次出现的"NUM:"),然后删除电话号码之后的所有内容,然后获取该号码. 之后,我想解析剩余字符串中的下一个匹配项.

When I parse for the data, my idea was first to remove all the "initial" "bla bla bla" in front of the first phone number (so I search for first occurrence of 'NUM:'), then I remove all the stuff after phone number, and get the number. After that I want to parse the next occurrence from the leftover string.

所以现在当我尝试sed时,我总是得到行中的最后一个数字:

So now when I try to sed it, I always get the last number on the line:

>sed 's/.*NUM://' data.txt
08022222222 bla bla bla
> 

我主要想了解我对SED的理解是怎么回事.当然,欢迎提出更有效的建议! 我的sed命令不是说用(空)"替换"NUM:"之前的所有内容吗?为什么总是匹配最后一次出现?

Primarily I would like to understand what's wrong with my understanding of SED. Of course more efficient suggestions are welcome! Doesn't my sed command say, replace all stuff before 'NUM:' with '' (empty)? Why it matches always the last occurrence ?

谢谢!

推荐答案

这可能对您有用:

echo "bla bla bla NUM:09011111111 bla bla bla bla NUM:08022222222 bla bla bla" |
sed 's/NUM:/\n&/g;s/[^\n]*\n\(NUM:[0-9]*\)[^\n]*/\1 /g;s/.$//'
NUM:09011111111 NUM:08022222222

您遇到的问题是了解.*是贪婪的,即它匹配最长匹配项 not 而不是第一个匹配项.通过在我们感兴趣的字符串前面放置一个唯一字符(\n sed将其用作行定界符,因此它不能存在于行中),然后删除所有不是该唯一字符后跟唯一字符\n,我们将字符串有效地分成了易于处理的部分.

The problem you have is understanding that the .* is greedy i.e. it matches the longest match not the first match. By placing a unique character (\n sed uses it as a line delimiter so it cannot exist in the line) in front of the string we're interested in (NUM:...) and deleting everything that is not that unique character [^\n]* followed by the unique character \n, we effectively split the string into manageable pieces.

这篇关于SED:同一行上有多个模式,如何匹配/解析第一个的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆