正则表达式懒惰与贪婪的混淆 [英] Regex lazy vs greedy confusion

查看:44
本文介绍了正则表达式懒惰与贪婪的混淆的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对正则表达式和贪婪与懒惰有点困惑.这真的很简单,感觉好像我遗漏了一些明显的东西.

I'm a little confused about regular expressions and greedy vs lazy. It's really very simple and it feels like I'm missing something obvious.

我已经尽可能地简化了我的问题以使其清楚.考虑以下字符串和正则表达式模式.

I've simplified my problem as much as I can to make it clear. Consider the following string and regex pattern.

string:
aaxxxb

pattern:
(?<=a)(.*?)(?=b)

result:
axxx

what I expected:
xxx

这个结果是我期望使用 .* 而不是 .* 的结果?,我错过了什么?

This result is what I would expect from using .* instead of .*?, what am I missing?

显然,如果我使用 a.*?b 也会给我 aaxxxb.为什么是这样?不应该懒惰(比如 .*?)返回尽可能少的字符吗?

Obviously, same thing if I use a.*?b gives me aaxxxb. Why is this? Shouldn't lazy (like .*?) return as few characters as possible?

推荐答案

您忽略了这样一个事实,即正则表达式引擎从左到右、逐个位置地工作,并且一旦在当前位置找到匹配项就会成功.

You are missing the fact that a regex engine works from left to right, position by position, and succeeds as soon as it finds a match at the current position.

在您的示例中,模式成功的第一个位置是第二个a".

In your example, the first position where the pattern succeeds is at the second "a".

懒惰只在右侧起作用.

如果要获取xxx",更好的方法是使用否定字符类[^ab]* 而不是.*?

If you want to obtain "xxx", a better way is to use a negated character class [^ab]* instead of .*?

注意:与主题不完全相关,但很高兴知道:DFA 正则表达式引擎将尝试在交替的情况下获得最大的结果,NFA 为您提供第一个成功的结果.

Note: not exactly related to the subject, but good to know: a DFA regex engine will try to get the largest result in case of alternation, a NFA gives you the first that succeeds.

这篇关于正则表达式懒惰与贪婪的混淆的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆