正则表达式挂起程序(100%CPU使用率) [英] Regular expression hangs program (100% CPU usage)

查看:137
本文介绍了正则表达式挂起程序(100%CPU使用率)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我使用下面的字符串作为正则表达式的输入时,Java挂起了100%的CPU使用率。

Java is hanging with 100% CPU usage when I use the below string as input for a regular expression.

RegEx已使用:

以下是用于说明字段的正则表达式在我的申请中。

Here is the regular expression used for the description field in my application.

^([A-Za-z0-9\\-\\_\\.\\&\\,]+[\\s]*)+

用于测试的字符串:


SaaS服务VLAN来自Provider_One

与迪迪埃SPT的第二次尝试,因为他给我的第一次是错误的: - (

SaaS Service VLAN from Provider_One
2nd attempt with Didier SPT because the first one he gave me was wrong :-(

它有效当我以不同的组合拆分相同的字符串时,就好了。就像来自Provider_One的SaaS服务VLAN,他给我的第一个是错的:-(等等.Java只挂在上面给定的字符串上。

It works properly when I split the same string in different combinations. Like "SaaS Service VLAN from Provider_One", "first one he gave me was wrong :-(", etc. Java is hanging only for the above given string.

我也尝试过如下优化正则表达式。

I also tried optimizing the regex as below.

^([\\w\\-\\.\\&\\,]+[\\s]*)+

即使这不起作用。

推荐答案

灾难性回溯的另一个经典案例

当正则表达式到达时,你有嵌套的量词导致大量的排列被检查:不属于您的字符类(假设您使用的是 .matches()方法)。

You have nested quantifiers that cause a gigantic number of permutations to be checked when the regex arrives at the : in your input string which is not part of your character class (assuming you're using the .matches() method).

让我们简化这个正则表达式的问题:

Let's simplify the problem to this regex:

^([^:]+)+$

此字符串:

1234:

正则表达式引擎需要检查

The regex engine needs to check

1234    # no repetition of the capturing group
123 4   # first repetition of the group: 123; second repetition: 4
12 34   # etc.
12 3 4 
1 234
1 23 4
1 2 34
1 2 3 4

...这只是四个字符。在您的示例字符串上,RegexBuddy在100万次尝试后中止。在最终承认这些组合中没有一个允许以下匹配之前,Java会很乐意继续... ...

...and that's just for four characters. On your sample string, RegexBuddy aborts after 1 million attempts. Java will happily keep on chugging... before finally admitting that none of these combinations allows the following : to match.

你怎么解决这个问题?

你可以使用占有量词

^([A-Za-z0-9_.&,-]++\\s*+)+

将允许正则表达式更快失败。顺便说一下,我删除了所有不必要的反斜杠。

will allow the regex to fail faster. Incidentally, I removed all those unnecessary backslashes.

修改:

一些测量:

在字符串错误:-)上,需要RegexBuddy 862步才能找到一个不匹配。

我错了:-),这是1,742步。

给了我错了:-),14014步。

对于他给我错了:-),28,046步。

对于他给我的一个错误:-),112,222步。

对于他给我的第一个是错的:-),> 1,000,000步。

On the string "was wrong :-)", it takes RegexBuddy 862 steps to figure out a non-match.
For "me was wrong :-)", it's 1,742 steps.
For "gave me was wrong :-)", 14,014 steps.
For "he gave me was wrong :-)", 28,046 steps.
For "one he gave me was wrong :-)", 112,222 steps.
For "first one he gave me was wrong :-)", >1,000,000 steps.

这篇关于正则表达式挂起程序(100%CPU使用率)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆