怎么样?使正则表达式中的量词变得懒惰 [英] How does the ? make a quantifier lazy in regex

查看:97
本文介绍了怎么样?使正则表达式中的量词变得懒惰的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近一直在研究正则表达式,发现?运算符使*+?变得懒惰.我的问题是它是如何做到的?例如*?是特殊运算符,还是?*有影响?换句话说,正则表达式本身将*?识别为一个运算符,还是将正则表达式将*?识别为两个单独的运算符*??如果*?被识别为两个单独的运算符,则?如何影响*使其变得懒惰.如果?表示*是可选的,那不应该意味着*根本不需要存在.如果是这样,那么在.*?语句中,正则表达式不只是匹配单独的字母和整个字符串而不是较短的字符串吗?请解释,我很想了解.非常感谢.

I've been looking into regex lately and figured that the ? operator makes the *,+, or ? lazy. My question is how does it do that? Is it that *? for example is a special operator, or does the ? have an effect on the * ? In other words, does regex recognize *? as one operator in itself, or does regex recognize *? as the two separate operators * and ? ? If it is the case that *? is being recognized as two separate operators, how does the ? affect the * to make it lazy. If ? means that the * is optional, shouldn't this mean that the * doesn't have to exists at all. If so, then in a statement .*? wouldn't regex just match separate letters and the whole string instead of the shorter string? Please explain, I'm desperate to understand.Many thanks.

推荐答案

我认为有一点历史可以使其更容易理解.当Larry Wall想要发展正则表达式语法以支持新功能时,他的选择受到了严重限制.他不能仅仅命令(例如)%现在是支持新功能"XYZ"的元字符.这将破坏碰巧使用%匹配文字百分号的数百万个现有正则表达式.

I think a little history will make it easier to understand. When the Larry Wall wanted to grow regex syntax to support new features, his options were severely limited. He couldn't just decree (for example) that % is now a metacharacter that supports new feature "XYZ". That would break the millions of existing regexes that happened to use % to match a literal percent sign.

可以可以做的就是采用一个已经定义的元字符,并以某种方式使用它,以至于其原始功能变得毫无意义.例如,任何连续包含两个量词的正则表达式都是无效的,因此可以安全地说一个?在另一个量词现在将其变成一个勉强的量词(比惰性" IMO好得多的名称; 非贪婪也很好).因此,您的问题的答案是?不会修改 **?是单个实体:一个勉强的量词.所有格修饰符(*+{0,2}+等)中的+也是一样.

What he could do is take an already-defined metacharacter and use it in such a way that its original function wouldn't make sense. For example, any regex that contained two quantifiers in a row would be invalid, so it was safe to say a ? after another quantifier now turns it into a reluctant quantifier (a much better name than "lazy" IMO; non-greedy good too). So the answer to your question is that ? doesn't modify the *, *? is a single entity: a reluctant quantifier. The same is true of the + in possessive quantifiers (*+, {0,2}+ etc.).

组语法也发生了类似的过程.在未转义的左括号后再加上一个量词是没有意义的,因此可以肯定地说(?现在标志着特殊组构造的开始.但是仅问号将仅支持一项新功能,因此要跟随的?本身必须后面至少再有一个字符,以指示它是哪种组((?:...)(?<!...)等. ).同样,(?:是单个实体:非捕获组的开始定界符.

A similar process occurred with group syntax. It would never make sense to have a quantifier after an unescaped opening parenthesis, so it was safe to say (? now marks the beginning of a special group construct. But the question mark alone would only support one new feature, so the ? itself to be followed has to be followed by at least one more character to indicate which kind of group it is ((?:...), (?<!...), etc.). Again, the (?: is a single entity: the opening delimiter of a non-capturing group.

我不知道为什么他两次都使用问号.我确实知道 Perl 6规则(Perl 5 regexes的自下而上的重写)已经废除了所有这些废话,并使用无限更明智的语法.

I don't know offhand why he used the question mark both times. I do know Perl 6 Rules (a bottom-up rewrite of Perl 5 regexes) has done away with all that crap and uses an infinitely more sensible syntax.

这篇关于怎么样?使正则表达式中的量词变得懒惰的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆