为什么(。*)*进行两场比赛,然后选择没有在组$ 1〜 [英] Why does (.*)* make two matches and select nothing in group $1?
问题描述
这源于一个讨论正式定期EX pressions语法。我见过有多个固定的前pression解析器这种行为,所以我标记它的语言无关。
看看下面的EX pression(调整它您喜欢的语言):
更换(输入,(。*)*,$ 1)
它会返回一个空字符串。为什么呢?
更奇怪的,甚至,前pression 更换(输入,(。*)*,A $ 1B)
将返回字符串 ABAB
。为什么双空匹配?
免责声明:我知道回溯和贪婪的比赛,但规则规定由杰弗里·弗里德尔似乎决定了。*
相匹配的一切,并没有进一步的回溯或匹配完成。那么,为什么是 $ 1
空?
注意:与 *
,它返回输入字符串比较(+)。然而, http://regexhero.com 表明,仍然有两场比赛,这似乎很奇怪,基于同样的理由如上。
让我们看看会发生什么:
-
(。*)
匹配输入
。 -
输入
被抓进集团1
。 - 的正则表达式引擎现在位于字符串的结尾。但是,由于
(。*)
是重复的,另一场比赛尝试: -
(。*)
匹配输入
。 在空字符串
- 在空字符串被捕获到组
1
,覆盖输入
。 -
$ 1
现在包含一个空字符串。
在评论一个很好的问题:
那么,为什么
更换(输入,(输入)*,A $ 1B)
返回AinputBAB
?
-
(输入)*
匹配输入
。它被替换为AinputB
。 -
(输入)*
匹配空字符串。它被替换为AB
($ 1
是空的,因为它没有参加比赛)。 - 结果:
AinputBAB
This arose from a discussion on formalizing regular expressions syntax. I've seen this behavior with several regular expression parsers, hence I tagged it language-agnostic.
Take the following expression (adjust it for your favorite language):
replace("input", "(.*)*", "$1")
it will return an empty string. Why?
More curiously even, the expression replace("input", "(.*)*", "A$1B")
will return the string ABAB
. Why the double empty match?
Disclaimer: I know about backtracking and greedy matches, but the rules laid out by Jeffrey Friedl seem to dictate that .*
matches everything and that no further backtracking or matching is done. Then why is $1
empty?
Note: compare with (.+)*
, which returns the input string. However, http://regexhero.com shows that there are still two matches, which seems odd for the same reasons as above.
Let's see what happens:
(.*)
matches"input"
."input"
is captured into group1
.- The regex engine is now positioned at the end of the string. But since
(.*)
is repeated, another match attempt is made: (.*)
matches the empty string after"input"
.- The empty string is captured into group
1
, overwriting"input"
. $1
now contains the empty string.
A good question from the comments:
Then why does
replace("input", "(input)*", "A$1B")
return"AinputBAB"
?
(input)*
matches"input"
. It is replaced by"AinputB"
.(input)*
matches the empty string. It is replaced by"AB"
($1
is empty because it didn't participate in the match).- Result:
"AinputBAB"
这篇关于为什么(。*)*进行两场比赛,然后选择没有在组$ 1〜的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!