为什么sed不打印可选组? [英] Why sed doesn't print an optional group?

查看:64
本文介绍了为什么sed不打印可选组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个字符串,例如 foo_bar foo_abc_bar .我想将它们都匹配,如果第一个匹配,我想用 = 符号来强调它.所以,我的猜测是:

I have two strings, say foo_bar and foo_abc_bar. I would like to match both of them, and if the first one is matched I would like to emphasize it with = sign. So, my guess was:

echo 'foo_abc_bar' | sed -r 's/(foo).*(abc)?.*(bar)/\1=\2=\3/g'
> foo==bar

echo 'foo_abc_bar' | sed -r 's/(foo).*((abc)?).*(bar)/\1=\2=\3/g'
> foo==

但是如上面的输出所示,它们都不起作用.

But as output above shows none of them work.

如何指定一个可选的组,如果字符串包含它,该组将匹配,否则,则跳过?

How can I specify an optional group that will match if the string contains it or just skip if not?

推荐答案

解决方案:

echo 'foo_abc_bar' | sed -r 's/(foo)_((abc)_)?(bar)/\1=\3=\4/g'

您以前的尝试为何无效:

.* 是贪婪的,所以对于正则表达式(foo).*(abc)?.*(bar)尝试匹配'foo_abc_bar' (foo)将匹配'foo',然后.* 最初将匹配字符串的其余部分('_ abc_bar').正则表达式将继续直到到达所需的(bar)组,并且此操作将失败,这时正则表达式将通过放弃与匹配的字符来回溯.* .这将一直进行到第一个.* 仅匹配'_ abc _'为止,此时最后一组可以匹配'bar'.因此,不是在捕获组中匹配字符串中的'abc',而是在未捕获的.* .

.* is greedy, so for the regex (foo).*(abc)?.*(bar) attempting to match 'foo_abc_bar' the (foo) will match 'foo', and then the .* will initially match the rest of the string ('_abc_bar'). The regex will continue until it reaches the required (bar) group and this will fail, at which point the regex will backtrack by giving up characters that had been matched by the .*. This will happen until the first .* is only matching '_abc_', at which point the final group can match 'bar'. So instead of the 'abc' in your string being matched in the capture group it is matched in the non-capturing .*.

我的解决方案说明:

第一个也是最重要的事情是用 _ 替换.* ,如果知道分隔符是什么,则无需匹配任何任意字符串.我们需要做的下一件事是准确地找出字符串的哪一部分是可选的.如果字符串'foo_abc_bar''foo_bar'都有效,则中间的'abc _'是可选的.我们可以使用(abc _)?将其放在可选组中.最后一步是确保捕获组中仍然有字符串'abc',我们可以通过将该部分包装到另一个组中来完成此操作,因此我们以((abc)_)?.然后我们需要调整替换,因为有一个额外的组,所以我们使用 \ 1 = \ 3 = \ 4 代替 \ 1 = \ 2 = \ 3 \ 2 将是字符串'abc _'(如果匹配).请注意,在大多数正则表达式实现中,您也可以使用非捕获组并继续使用 \ 1 = \ 2 = \ 3 ,但是sed不支持非捕获组.

The first and most important thing is to replace the .* with _, there is no need to match any arbitrary string if you know what the separator will be. The next thing we need to do is figure out exactly which portion of the string is optional. If the strings 'foo_abc_bar' and 'foo_bar' are both valid, then the 'abc_' in the middle is optional. We can put this in an optional group using (abc_)?. The last step is to make sure that we still have the string 'abc' in a capturing group, which we can do by wrapping that portion in an additional group, so we end up with ((abc)_)?. We then need to adjust the replacement because there is an extra group, so instead of \1=\2=\3 we use \1=\3=\4, \2 would be the string 'abc_' (if it matched). Note that in most regex implementations you could also have used a non-capturing group and continued to use \1=\2=\3, but sed does not support non-capturing groups.

替代方法:

我认为上面的正则表达式是您最好的选择,因为它是最明确的(它只会匹配您感兴趣的确切字符串).但是,您也可以通过使用惰性重复(匹配尽可能少的字符)而不是贪婪重复(匹配尽可能多的字符)来避免上述问题.您可以通过将.* 更改为.*?来执行此操作,因此您的表达式将如下所示:

I think the regex above is your best bet because it is most explicit (it will only match the exact strings you are interested in). However you could also avoid the issue described above by using lazy repetition (matches as few characters as possible) instead of greedy repetition (matches as many characters as possible). You can do this by changing the .* to .*?, so your expression would look something like this:

echo 'foo_abc_bar' | sed -r 's/(foo).*?(abc).*?(bar)/\1=\2=\3/g'

这篇关于为什么sed不打印可选组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆