Javascript 正则表达式挂起(使用 v8) [英] Javascript regex hangs (using v8)

查看:25
本文介绍了Javascript 正则表达式挂起(使用 v8)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用这个正则表达式来获取文件中标签的内容.

var regex = new RegExp("((?:.|\s)*)");

这会导致 v8 引擎无限期挂起.

现在,如果我使用 new RegExp("<tag:main>([sS]*)</tag:main>"),一切都很好.>

有人知道为什么第一个需要太长时间吗?

解决方案

这在最后一个关闭 </tag:main> 标签之后出现的长空格序列上发生了灾难性的回溯.考虑主题字符串以 100 个空格结尾的情况.首先,它将它们全部与交替左侧的 . 匹配.失败是因为没有结束标记,所以它尝试将最后一个字符与 s 匹配.这也失败了,因此它尝试将倒数第二个空格匹配为 s,将最后一个空格匹配为 ..那失败了(仍然没有结束标记)所以它尝试将最后一个空格作为 s.当失败时,它将倒数第三个空格作为 s 匹配,并尝试所有 4 种方法来匹配最后两个空格.如果失败,它会尝试将倒数第四个空格作为 s 以及最后 3 个空格上的所有 8 种方式.然后是 16、32 等.宇宙在到达倒数第 100 个空间之前就结束了.

由于灾难性的回溯,不同的 VM 对 regexp 匹配有不同的反应.有些人会简单地报告不匹配".在 V8 中,这就像编写任何其他无限或接近无限循环一样.

使用非贪婪的 * 会做你想做的事(你想停在第一个 </tag:main>,而不是最后一个),但会仍然对缺少结束序列的长字符串进行灾难性回溯.

确保内括号中的相同字符不能与交替的两边匹配,将把问题从指数一减少到一,在字符串的长度上是线性的.使用字符类代替交替或将 放在交替栏的右侧. 不相交. 因此,如果您遇到一长串空格,正则表达式引擎不会在终止之前尝试所有左右组合等.>

Im using this regex to get the contents of a tag in a file.

var regex = new RegExp("<tag:main>((?:.|\s)*)</tag:main>");

This causes the v8 engine to hang indefinitely.

Now, if I use new RegExp("<tag:main>([sS]*)</tag:main>"), all is good.

Anyone have an idea why the first one takes too long?

解决方案

This catastrophically backtracks on long sequences of spaces that occur after the last closing </tag:main> tag. Consider the case where the subject string ends with 100 spaces. First it matches them all with the . on the left of the alternation. That fails because there's no closing tag, so it tries matching the last character with the s instead. That fails too, so it tries matching the second-to-last space as a s and the last space as a .. That fails (still no closing tag) so it tries the last space as a s. When that fails it matches the third-to-last space as a s and tries all 4 ways to match the last two spaces. When that fails it tries the fourth-to-last space as a s and all 8 ways on the last 3 spaces. Then 16, 32 etc. The universe ends before it gets to the 100th-to-last space.

Different VMs have different reactions to regexp matches that take forever because of catastrophic backtracking. Some will simply report 'no match'. In V8 it's like writing any other infinite or near-infinite loop.

Using non-greedy * will do what you want (you want to stop at the first </tag:main>, not the last), but will still do catastrophic backtracking for long strings of spaces where the closing sequence is missing.

Making sure the same characters in the inner bracket can't match both sides of the alternation will reduce the problem from an exponential one to one that is linear in the length of the string. Use a character class instead of an alternation or put on the right hand side of the alternation bar. is disjoint with . so if you hit a long sequence of spaces the regexp engine doesn't try all left-right-left etc. combinations before terminating.

这篇关于Javascript 正则表达式挂起(使用 v8)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆