PCRE 正则表达式反向引用有效,但子程序无效 [英] PCRE regex backreference works, but subroutines do not
问题描述
我正在尝试匹配文本:
1.嘿嘿嘿嘿"
2.嘿嘿嘿嘿"
使用正则表达式:
a /(\w+) \1\w/
b /(\w+) (\w+)\w/
c /(\w+) (?1)\w/
- Regex a 完全匹配1,2 完全但最后一个 'y'.
- Regex b 完全匹配 1 和 2强>.
- Regex c 不匹配 1 或 2.
遵循http://www.rexegg.com/regex-disambiguation.html#subroutines 我认为 b 和 c 是等价的.但显然,他们不是.
Following http://www.rexegg.com/regex-disambiguation.html#subroutines I thought b and c are equivalent. But apparently, they are not.
有什么区别?为什么子程序不起作用,而复制相同的正则表达式有效?
What is the difference? Why is the subroutine not working, while copying the same regex works?
在这里实验:https://regex101.com/#pcre
推荐答案
这是因为在 PCRE 中,对子模式的引用 ((?1)
here) 是默认情况下是原子的.
It is because with PCRE, the reference to a subpattern ((?1)
here) is atomic by default.
(请注意,此行为是 PCRE 特有的,Perl 不共享.)
子模式是\w+
(带贪婪量词),所有单词字符都匹配(HeyHeyy
在第二个string),但是由于 (?1)
是原子的,正则表达式引擎不能回溯并返回最后一个 y
来使 \w
成功.
The subpattern is \w+
(with a greedy quantifier), all the word characters are matched (HeyHeyy
in the second string), but since (?1)
is atomic, the regex engine can't backtrack and give back the last y
to make \w
succeed.
您可以使用此模式获得相同的结果:
You can obtain the same result with this pattern:
/(\w+) (?>\w+)\w/
# ^-----^-- atomic group
不匹配字符串,当没有原子组时,模式成功:
that doesn't match the string, when without the atomic group, the pattern succeeds:
/(\w+) \w+\w/
关于原子组的更多信息:http://regular-expressions.info/atomic.html
More about atomic groups: http://regular-expressions.info/atomic.html
这里也描述了这种特殊性(但仅在递归上下文中):http://www.rexegg.com/regex-recursion.html (参见递归深度是原子的")
This particularity is also described here (but only in a recursive context): http://www.rexegg.com/regex-recursion.html (see "Recursion Depths are Atomic")
这篇关于PCRE 正则表达式反向引用有效,但子程序无效的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!