为什么(。*)*进行两场比赛,然后选择没有在组$ 1〜 [英] Why does (.*)* make two matches and select nothing in group $1?

查看:226
本文介绍了为什么(。*)*进行两场比赛,然后选择没有在组$ 1〜的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这源于一个讨论正式定期EX pressions语法。我见过有多个固定的前pression解析器这种行为,所以我标记它的语言无关。

看看下面的EX pression(调整它您喜欢的语言):

 更换(输入,(。*)*,$ 1)
 

它会返回一个空字符串。为什么呢?

更奇怪的,甚至,前pression 更换(输入,(。*)*,A $ 1B)将返回字符串 ABAB 。为什么双空匹配?

免责声明:我知道回溯和贪婪的比赛,但规则规定由杰弗里·弗里德尔似乎决定了。* 相匹配的一切,并没有进一步的回溯或匹配完成。那么,为什么是 $ 1 空?

注意: * ,它返回输入字符串比较(+)。然而, http://regexhero.com 表明,仍然有两场比赛,这似乎很奇怪,基于同样的理由如上。

解决方案

让我们看看会发生什么:

  1. (。*)匹配输入
  2. 输入被抓进集团 1
  3. 的正则表达式引擎现在位于字符串的结尾。但是,由于(。*)是重复的,另一场比赛尝试:
  4. (。*)匹配输入
  5. 在空字符串
  6. 在空字符串被捕获到组 1 ,覆盖输入
  7. $ 1 现在包含一个空字符串。

在评论一个很好的问题:

  

那么,为什么更换(输入,(输入)*,A $ 1B)返回AinputBAB

  1. (输入)* 匹配输入。它被替换为AinputB
  2. (输入)* 匹配空字符串。它被替换为AB $ 1 是空的,因为它没有参加比赛)。
  3. 结果:AinputBAB

This arose from a discussion on formalizing regular expressions syntax. I've seen this behavior with several regular expression parsers, hence I tagged it language-agnostic.

Take the following expression (adjust it for your favorite language):

replace("input", "(.*)*", "$1")

it will return an empty string. Why?

More curiously even, the expression replace("input", "(.*)*", "A$1B") will return the string ABAB. Why the double empty match?

Disclaimer: I know about backtracking and greedy matches, but the rules laid out by Jeffrey Friedl seem to dictate that .* matches everything and that no further backtracking or matching is done. Then why is $1 empty?

Note: compare with (.+)*, which returns the input string. However, http://regexhero.com shows that there are still two matches, which seems odd for the same reasons as above.

解决方案

Let's see what happens:

  1. (.*) matches "input".
  2. "input" is captured into group 1.
  3. The regex engine is now positioned at the end of the string. But since (.*) is repeated, another match attempt is made:
  4. (.*) matches the empty string after "input".
  5. The empty string is captured into group 1, overwriting "input".
  6. $1 now contains the empty string.

A good question from the comments:

Then why does replace("input", "(input)*", "A$1B") return "AinputBAB"?

  1. (input)* matches "input". It is replaced by "AinputB".
  2. (input)* matches the empty string. It is replaced by "AB" ($1 is empty because it didn't participate in the match).
  3. Result: "AinputBAB"

这篇关于为什么(。*)*进行两场比赛,然后选择没有在组$ 1〜的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆