零长度正则表达式和无限匹配? [英] Zero-Length regexes and infinite matches?
问题描述
在尝试详细回答
您在 regexr.com 在线正则表达式测试器上选择了 JavaScript 正则表达式风格.当传递可以匹配空字符串的模式时,JavaScript 正则表达式引擎不会自动移动索引.
这就是为什么当您需要模拟在 .NET Regex.Matches
、PHP preg_match_all
、Python re.finditer
中观察到的行为时,等你需要手动推进索引来测试每个位置.
参见 regex101.com 测试:
var re =/a*/g;var str = 'dgwawa';无功米;while ((m = re.exec(str)) !== null) {if (m.index === re.lastIndex) {//<- 这部分re.lastIndex++;//<- 这里}//<- 很重要document.body.innerHTML += "'" + m[0] + "'
";}
如果你删除那个 if
块,你会得到一个无限循环.
在这方面有两件非常重要的事情需要提及:
- 始终为您的编程语言使用合适的在线正则表达式测试器
- 避免使用可以匹配空字符串的非锚定模式
In trying to elaborate an answer to this question, I am now trying to come to terms with the behavior/meaning of Zero-Length regular expressions.
I often use www.regexr.com as a playground to test/debug/understand what's going on in regular expressions.
So we have this most banal scenario:
The regex is a*
The input string is dgwawa
(As a matter of fact, the string here is irrelevant)
Why this behavior of reporting that this regex will match infinitely, since it matches zero occurrences of the preceding character ?
Why can't the result be 6 matches, one for each character position (since at every character, regardless of whether it is an a or not, there is a match, since zero matches is a match)?
How does it get into matching infinitely ? So it does not check/progress a character at a time?
I wonder how/where does it get itself into an infinite loop.
You selected JavaScript regex flavor at regexr.com online regex tester. JavaScript regex engine does not move the index automatically when a pattern that can match an empty string is passed.
That is why when you need to emulate the behavior observed in .NET Regex.Matches
, PHP preg_match_all
, Python re.finditer
, etc. you need to manually advance the index to test each position.
See regex101.com test:
var re = /a*/g;
var str = 'dgwawa';
var m;
while ((m = re.exec(str)) !== null) {
if (m.index === re.lastIndex) { // <- this part
re.lastIndex++; // <- here
} // <- is important
document.body.innerHTML += "'" + m[0] + "'<br/>";
}
If you remove that if
block, you will get an infinite loop.
There are two very important things to mention with this regard:
- Always use appropriate online regex tester for your programming language
- Avoid using unanchored patterns that can match empty strings
这篇关于零长度正则表达式和无限匹配?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!