PCRE 正则表达式反向引用有效,但子程序无效 [英] PCRE regex backreference works, but subroutines do not

查看:87
本文介绍了PCRE 正则表达式反向引用有效,但子程序无效的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试匹配文本:

1.嘿嘿嘿嘿"

2.嘿嘿嘿嘿"

使用正则表达式:

a /(\w+) \1\w/

b /(\w+) (\w+)\w/

c /(\w+) (?1)\w/

  • Regex a 完全匹配12 完全但最后一个 'y'.
  • Regex b 完全匹配 12.
  • Regex c 不匹配 12.

遵循http://www.rexegg.com/regex-disambiguation.html#subroutines 我认为 b 和 c 是等价的.但显然,他们不是.

Following http://www.rexegg.com/regex-disambiguation.html#subroutines I thought b and c are equivalent. But apparently, they are not.

有什么区别?为什么子程序不起作用,而复制相同的正则表达式有效?

What is the difference? Why is the subroutine not working, while copying the same regex works?

在这里实验:https://regex101.com/#pcre

推荐答案

这是因为在 PCRE 中,对子模式的引用 ((?1) here) 是默认情况下是原子的.

It is because with PCRE, the reference to a subpattern ((?1) here) is atomic by default.

(请注意,此行为是 PCRE 特有的,Perl 不共享.)

子模式是\w+ (带贪婪量词),所有单词字符都匹配(HeyHeyy在第二个string),但是由于 (?1) 是原子的,正则表达式引擎不能回溯并返回最后一个 y 来使 \w 成功.

The subpattern is \w+ (with a greedy quantifier), all the word characters are matched (HeyHeyy in the second string), but since (?1) is atomic, the regex engine can't backtrack and give back the last y to make \w succeed.

您可以使用此模式获得相同的结果:

You can obtain the same result with this pattern:

/(\w+) (?>\w+)\w/
     # ^-----^-- atomic group

不匹配字符串,当没有原子组时,模式成功:

that doesn't match the string, when without the atomic group, the pattern succeeds:

/(\w+) \w+\w/

关于原子组的更多信息:http://regular-expressions.info/atomic.html

More about atomic groups: http://regular-expressions.info/atomic.html

这里也描述了这种特殊性(但仅在递归上下文中):http://www.rexegg.com/regex-recursion.html (参见递归深度是原子的")

This particularity is also described here (but only in a recursive context): http://www.rexegg.com/regex-recursion.html (see "Recursion Depths are Atomic")

这篇关于PCRE 正则表达式反向引用有效,但子程序无效的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆