为什么正则表达式引擎选择从.X | ..X | X.中匹配模式`.X'? [英] Why regex engine choose to match pattern `..X` from `.X|..X|X.`?
问题描述
我有一个字符串
1234X5678
并且我使用此正则表达式匹配模式
and I use this regex to match pattern
.X|..X|X.
我知道了
34X
问题是为什么我没有得到4X
或X5
?
The question is why didn't I get 4X
or X5
?
为什么正则表达式选择执行第二种模式?
Why regex choose to perform the second pattern?
推荐答案
这里的重点是:
Regex引擎默认情况下会分析从左到右的输入.
因此,您有一个交替模式.X|..X|X.
,并针对1234X5678
运行它. 看看会发生什么:
So, you have an alternation pattern .X|..X|X.
and you run it against 1234X5678
. See what happens:
每个替代分支都针对字符串中从左到右的每个位置进行测试.
前1-7个步骤显示了引擎如何尝试匹配字符串开头的字符.但是,没有分支(.X
,..X
和X.
都不匹配12
或123
).
The first 1-7 steps show how the engine tries to match the characters at the beginning of the string. However, none of the branches (neither .X
, nor ..X
, nor X.
match 12
or 123
).
步骤8-13只是重复相同的失败情况,因为没有一个分支匹配23
或234
.
Steps 8-13 just repeat the same failing scenario as none of the branches match 23
or 234
.
步骤14-19显示了成功的情况,因为34X
可以与分支2(..X
)匹配.
Steps 14-19 show a success scenario because the 34X
can be matched with Branch 2 (..X
).
正则表达式引擎未到达4
之前的位置,因为该位置已匹配并已被消耗 .
The regex engine does not reach the location before 4
since this location gets matched and consumed.
另一个结论:
交替的顺序很重要,在NFA regex引擎中,第一个替代匹配获胜,但该替代不必是最短的替代,而是更长的替代,它与开始可以更早匹配.
The order of alternations matters, and in NFA regex engines the first alternative matched wins, BUT this alternative does not have to be the first shortest one, a farther longer alternative that matches the same characters in the beginning can match earlier.
这篇关于为什么正则表达式引擎选择从.X | ..X | X.中匹配模式`.X'?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!