正则表达式懒惰与贪婪的混淆 [英] Regex lazy vs greedy confusion
问题描述
我对正则表达式和贪婪与懒惰有点困惑.这真的很简单,感觉好像我遗漏了一些明显的东西.
I'm a little confused about regular expressions and greedy vs lazy. It's really very simple and it feels like I'm missing something obvious.
我已经尽可能地简化了我的问题以使其清楚.考虑以下字符串和正则表达式模式.
I've simplified my problem as much as I can to make it clear. Consider the following string and regex pattern.
string:
aaxxxb
pattern:
(?<=a)(.*?)(?=b)
result:
axxx
what I expected:
xxx
这个结果是我期望使用 .* 而不是 .* 的结果?,我错过了什么?
This result is what I would expect from using .* instead of .*?, what am I missing?
显然,如果我使用 a.*?b 也会给我 aaxxxb.为什么是这样?不应该懒惰(比如 .*?)返回尽可能少的字符吗?
Obviously, same thing if I use a.*?b gives me aaxxxb. Why is this? Shouldn't lazy (like .*?) return as few characters as possible?
推荐答案
您忽略了这样一个事实,即正则表达式引擎从左到右、逐个位置地工作,并且一旦在当前位置找到匹配项就会成功.
You are missing the fact that a regex engine works from left to right, position by position, and succeeds as soon as it finds a match at the current position.
在您的示例中,模式成功的第一个位置是第二个a".
In your example, the first position where the pattern succeeds is at the second "a".
懒惰只在右侧起作用.
如果要获取xxx",更好的方法是使用否定字符类[^ab]*
而不是.*?
If you want to obtain "xxx", a better way is to use a negated character class [^ab]*
instead of .*?
注意:与主题不完全相关,但很高兴知道:DFA 正则表达式引擎将尝试在交替的情况下获得最大的结果,NFA 为您提供第一个成功的结果.
Note: not exactly related to the subject, but good to know: a DFA regex engine will try to get the largest result in case of alternation, a NFA gives you the first that succeeds.
这篇关于正则表达式懒惰与贪婪的混淆的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!