正则表达式挂起程序(100%CPU使用率) [英] Regular expression hangs program (100% CPU usage)
问题描述
当我使用下面的字符串作为正则表达式的输入时,Java挂起了100%的CPU使用率。
Java is hanging with 100% CPU usage when I use the below string as input for a regular expression.
RegEx已使用:
以下是用于说明字段的正则表达式在我的申请中。
Here is the regular expression used for the description field in my application.
^([A-Za-z0-9\\-\\_\\.\\&\\,]+[\\s]*)+
用于测试的字符串:
SaaS服务VLAN来自Provider_One
与迪迪埃SPT的第二次尝试,因为他给我的第一次是错误的: - (
SaaS Service VLAN from Provider_One
2nd attempt with Didier SPT because the first one he gave me was wrong :-(
它有效当我以不同的组合拆分相同的字符串时,就好了。就像来自Provider_One的SaaS服务VLAN,他给我的第一个是错的:-(等等.Java只挂在上面给定的字符串上。
It works properly when I split the same string in different combinations. Like "SaaS Service VLAN from Provider_One", "first one he gave me was wrong :-(", etc. Java is hanging only for the above given string.
我也尝试过如下优化正则表达式。
I also tried optimizing the regex as below.
^([\\w\\-\\.\\&\\,]+[\\s]*)+
即使这不起作用。
推荐答案
当正则表达式到达时,你有嵌套的量词导致大量的排列被检查:$输入字符串中的c $ c>不属于您的字符类(假设您使用的是
.matches()
方法)。
You have nested quantifiers that cause a gigantic number of permutations to be checked when the regex arrives at the :
in your input string which is not part of your character class (assuming you're using the .matches()
method).
让我们简化这个正则表达式的问题:
Let's simplify the problem to this regex:
^([^:]+)+$
此字符串:
1234:
正则表达式引擎需要检查
The regex engine needs to check
1234 # no repetition of the capturing group
123 4 # first repetition of the group: 123; second repetition: 4
12 34 # etc.
12 3 4
1 234
1 23 4
1 2 34
1 2 3 4
...这只是四个字符。在您的示例字符串上,RegexBuddy在100万次尝试后中止。在最终承认这些组合中没有一个允许以下:
匹配之前,Java会很乐意继续... ...
...and that's just for four characters. On your sample string, RegexBuddy aborts after 1 million attempts. Java will happily keep on chugging... before finally admitting that none of these combinations allows the following :
to match.
你怎么解决这个问题?
你可以使用占有量词:
^([A-Za-z0-9_.&,-]++\\s*+)+
将允许正则表达式更快失败。顺便说一下,我删除了所有不必要的反斜杠。
will allow the regex to fail faster. Incidentally, I removed all those unnecessary backslashes.
修改:
一些测量:
在字符串错误:-)
上,需要RegexBuddy 862步才能找到一个不匹配。
我错了:-)
,这是1,742步。
给了我错了:-)
,14014步。
对于他给我错了:-)
,28,046步。
对于他给我的一个错误:-)
,112,222步。
对于他给我的第一个是错的:-)
,> 1,000,000步。
On the string "was wrong :-)"
, it takes RegexBuddy 862 steps to figure out a non-match.
For "me was wrong :-)"
, it's 1,742 steps.
For "gave me was wrong :-)"
, 14,014 steps.
For "he gave me was wrong :-)"
, 28,046 steps.
For "one he gave me was wrong :-)"
, 112,222 steps.
For "first one he gave me was wrong :-)"
, >1,000,000 steps.
这篇关于正则表达式挂起程序(100%CPU使用率)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!