SED:同一行上有多个模式，如何匹配/解析第一个 [英] SED: multiple patterns on the same line, how to match/parse first one

查看：763 发布时间：2020/5/25 0:31:45 regex parsing sed last-occurrence

本文介绍了SED:同一行上有多个模式，如何匹配/解析第一个的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个文件，其中保存电话号码数据以及一些无用的东西. 我正在尝试解析数字，当只有1个电话号码/线路时，这没问题. 但是，当我有多个数字时，sed会匹配最后一个数字(即使到处都说它应该只匹配第一个模式?)，而我无法获取其他数字.

I have a file, which holds phone number data, and also some useless stuff. I'm trying to parse the numbers out, and when there is only 1 phone number / line, it's not problem. But when I have multiple numbers, sed matches the last one (even though everywhere it says it should match only match the first pattern?), and I can't get other numbers out..

我的data.txt:

My data.txt:

bla bla bla NUM:09011111111 bla bla bla bla NUM:08022222222 bla bla bla

当我解析数据时，我的想法是首先删除第一个电话号码前面的所有初始""bla bla bla"(因此我搜索首次出现的"NUM:")，然后删除电话号码之后的所有内容，然后获取该号码. 之后，我想解析剩余字符串中的下一个匹配项.

When I parse for the data, my idea was first to remove all the "initial" "bla bla bla" in front of the first phone number (so I search for first occurrence of 'NUM:'), then I remove all the stuff after phone number, and get the number. After that I want to parse the next occurrence from the leftover string.

所以现在当我尝试sed时，我总是得到行中的最后一个数字:

So now when I try to sed it, I always get the last number on the line:

>sed 's/.*NUM://' data.txt
08022222222 bla bla bla
>

我主要想了解我对SED的理解是怎么回事.当然，欢迎提出更有效的建议！我的sed命令不是说用(空)"替换"NUM:"之前的所有内容吗?为什么总是匹配最后一次出现?

Primarily I would like to understand what's wrong with my understanding of SED. Of course more efficient suggestions are welcome! Doesn't my sed command say, replace all stuff before 'NUM:' with '' (empty)? Why it matches always the last occurrence ?

谢谢！

推荐答案

这可能对您有用:

echo "bla bla bla NUM:09011111111 bla bla bla bla NUM:08022222222 bla bla bla" |
sed 's/NUM:/\n&/g;s/[^\n]*\n\(NUM:[0-9]*\)[^\n]*/\1 /g;s/.$//'
NUM:09011111111 NUM:08022222222

您遇到的问题是了解.*是贪婪的，即它匹配最长匹配项 not 而不是第一个匹配项.通过在我们感兴趣的字符串前面放置一个唯一字符(\n sed将其用作行定界符，因此它不能存在于行中)，然后删除所有不是该唯一字符后跟唯一字符\n，我们将字符串有效地分成了易于处理的部分.

The problem you have is understanding that the .* is greedy i.e. it matches the longest match not the first match. By placing a unique character (\n sed uses it as a line delimiter so it cannot exist in the line) in front of the string we're interested in (NUM:...) and deleting everything that is not that unique character [^\n]* followed by the unique character \n, we effectively split the string into manageable pieces.

这篇关于SED:同一行上有多个模式，如何匹配/解析第一个的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

SED:同一行上有多个模式，如何匹配/解析第一个 [英] SED: multiple patterns on the same line, how to match/parse first one

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

SED:同一行上有多个模式，如何匹配/解析第一个 [英] SED: multiple patterns on the same line, how to match/parse first one

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭