正则表达式失败很慢 [英] regular expression very slow on fail
问题描述
我有一个正则表达式,可以验证字符串是否由空格分隔的字符串组成.正则表达式可以很好地工作(好吧,它最后允许有一个空格...但这不是他的问题),但是在验证失败时花费的时间太长.
I've a regular expression that should validate if a string is composed by space-delimited strings. The regular expression works well (ok it allows a empty space in the end ... but that's not he problem) but takes too long when the validation fails.
正则表达式如下:
/^(([\w\-]+)( )?){0,}$/
当尝试使用字符串进行验证
When trying to validate with the string
"'this-is_SAMPLE-scope-123,this-is_SAMPLE-scope-456'"
需要2秒钟.
测试在红宝石1.9.2-rc1和1.8.7中进行.但这可能是一个普遍的问题.
The tests were performed in ruby 1.9.2-rc1 and 1.8.7. But this is probably a general problem.
有什么主意吗?
推荐答案
您的模式导致灾难性的回溯.灾难性的部分可以总结为:
Your pattern causes catastrophic backtracking. The catastrophic part can be summarized to this:
(.+)*
+
和*
在某些引擎中以灾难性的方式交互.
The +
and the *
interacts in catastrophic ways in some engines.
目前还不清楚您要匹配的是什么,但这可能是这样的:
It's unclear what you're trying to match, exactly, but it may be something like this:
^[\w\-]+( [\w\-]+)*$
This matches (as seen on rubular.com):
hello world
99 bottles of beer on the wall
this_works_too
并拒绝:
not like this, not like this
hey what the &#@!
too many spaces
另一种选择是在原始模式的某些部分中使用所有格修饰符和/或原子分组.
Another option would be to use possessive quantifiers and/or atomic groupings in parts of the original pattern.