Javascript正则表达式挂起(使用v8) [英] Javascript regex hangs (using v8)

查看:142
本文介绍了Javascript正则表达式挂起(使用v8)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用此正则表达式获取文件中标记的内容。

Im using this regex to get the contents of a tag in a file.

var regex = new RegExp("<tag:main>((?:.|\\s)*)</tag:main>");

这会导致v8引擎无限期挂起。

This causes the v8 engine to hang indefinitely.

现在,如果我使用新的RegExp(< tag:main>([\\\\ t] *)< / tag:main>),一切都很好。

Now, if I use new RegExp("<tag:main>([\s\S]*)</tag:main>"), all is good.

任何人都知道为什么第一个花了太长时间?

Anyone have an idea why the first one takes too long?

推荐答案

在最后一次关闭< / tag:main> 标记之后发生的长序列空间灾难性地回溯。考虑主题字符串以100个空格结尾的情况。首先,它将它们与交替左侧的进行匹配。这失败是因为没有结束标记,所以它尝试将最后一个字符与 \s 匹配。这也失败了,所以它尝试将倒数第二个空格作为 \s 进行匹配,将最后一个空格作为进行匹配。。失败(仍然没有结束标记)所以它尝试将最后一个空格作为 \s 。当失败时,它将第三个到最后一个空格与 \s 匹配,并尝试所有4种方式匹配最后两个空格。当失败时,它会尝试将倒数第四个空格作为 \s ,并在最后3个空格中尝试所有8种方式。然后是16,32等。宇宙在它到达第100个到最后一个空格之前就结束了。

This catastrophically backtracks on long sequences of spaces that occur after the last closing </tag:main> tag. Consider the case where the subject string ends with 100 spaces. First it matches them all with the . on the left of the alternation. That fails because there's no closing tag, so it tries matching the last character with the \s instead. That fails too, so it tries matching the second-to-last space as a \s and the last space as a .. That fails (still no closing tag) so it tries the last space as a \s. When that fails it matches the third-to-last space as a \s and tries all 4 ways to match the last two spaces. When that fails it tries the fourth-to-last space as a \s and all 8 ways on the last 3 spaces. Then 16, 32 etc. The universe ends before it gets to the 100th-to-last space.

不同的虚拟机对regexp匹配有不同的反应,因为灾难而需要永远回溯。有些人只会报告不匹配。在V8中,它就像编写任何其他无限或近无限循环一样。

Different VMs have different reactions to regexp matches that take forever because of catastrophic backtracking. Some will simply report 'no match'. In V8 it's like writing any other infinite or near-infinite loop.

使用非贪婪 * 将做什么你想要(你想停在第一个< / tag:main> ,而不是最后一个),但仍然会在关闭的长串空间中进行灾难性的回溯缺少序列。

Using non-greedy * will do what you want (you want to stop at the first </tag:main>, not the last), but will still do catastrophic backtracking for long strings of spaces where the closing sequence is missing.

确保内括号中的相同字符不能匹配交替的两侧将从一个指数减少到一个线性的问题在字符串的长度。使用字符类而不是替换,或在交替栏的右侧放置 \ n \ n 脱节。所以如果你遇到很长的空格序列,那么regexp引擎就不会尝试全部在终止之前左右 - 左等组合。

Making sure the same characters in the inner bracket can't match both sides of the alternation will reduce the problem from an exponential one to one that is linear in the length of the string. Use a character class instead of an alternation or put \n on the right hand side of the alternation bar. \n is disjoint with . so if you hit a long sequence of spaces the regexp engine doesn't try all left-right-left etc. combinations before terminating.

这篇关于Javascript正则表达式挂起(使用v8)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆