正则表达式挂起程序（100％CPU使用率） [英] Regular expression hangs program (100% CPU usage)

查看：137 发布时间：2018/12/26 14:25:40 java regex

本文介绍了正则表达式挂起程序（100％CPU使用率）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

当我使用下面的字符串作为正则表达式的输入时，Java挂起了100％的CPU使用率。

Java is hanging with 100% CPU usage when I use the below string as input for a regular expression.

RegEx已使用：

以下是用于说明字段的正则表达式在我的申请中。

Here is the regular expression used for the description field in my application.

^([A-Za-z0-9\\-\\_\\.\\&\\,]+[\\s]*)+

用于测试的字符串：

SaaS服务VLAN来自Provider_One

与迪迪埃SPT的第二次尝试，因为他给我的第一次是错误的： - （

SaaS Service VLAN from Provider_One
2nd attempt with Didier SPT because the first one he gave me was wrong :-(

它有效当我以不同的组合拆分相同的字符串时，就好了。就像来自Provider_One的SaaS服务VLAN，他给我的第一个是错的:-(等等.Java只挂在上面给定的字符串上。

It works properly when I split the same string in different combinations. Like "SaaS Service VLAN from Provider_One", "first one he gave me was wrong :-(", etc. Java is hanging only for the above given string.

我也尝试过如下优化正则表达式。

I also tried optimizing the regex as below.

^([\\w\\-\\.\\&\\,]+[\\s]*)+

即使这不起作用。

推荐答案

灾难性回溯的另一个经典案例。

当正则表达式到达时，你有嵌套的量词导致大量的排列被检查：不属于您的字符类（假设您使用的是 .matches（）方法）。


You have nested quantifiers that cause a gigantic number of permutations to be checked when the regex arrives at the : in your input string which is not part of your character class (assuming you're using the .matches() method).
让我们简化这个正则表达式的问题：
Let's simplify the problem to this regex:
^([^:]+)+$

此字符串：
1234:

正则表达式引擎需要检查
The regex engine needs to check
1234    # no repetition of the capturing group
123 4   # first repetition of the group: 123; second repetition: 4
12 34   # etc.
12 3 4 
1 234
1 23 4
1 2 34
1 2 3 4

 ...这只是四个字符。在您的示例字符串上，RegexBuddy在100万次尝试后中止。在最终承认这些组合中没有一个允许以下：匹配之前，Java会很乐意继续... ... 
...and that's just for four characters. On your sample string, RegexBuddy aborts after 1 million attempts. Java will happily keep on chugging... before finally admitting that none of these combinations allows the following : to match.
你怎么解决这个问题？
你可以使用占有量词：
^([A-Za-z0-9_.&,-]++\\s*+)+

将允许正则表达式更快失败。顺便说一下，我删除了所有不必要的反斜杠。
will allow the regex to fail faster. Incidentally, I removed all those unnecessary backslashes.
 修改： 
一些测量：
在字符串错误:-)上，需要RegexBuddy 862步才能找到一个不匹配。
 
 我错了:-)，这是1,742步。
 
 给了我错了:-)，14014步。
 
对于他给我错了:-)，28,046步。
 
对于他给我的一个错误:-)，112,222步。
 
对于他给我的第一个是错的:-)，> 1,000,000步。 
On the string "was wrong :-)", it takes RegexBuddy 862 steps to figure out a non-match.

For "me was wrong :-)", it's 1,742 steps.

For "gave me was wrong :-)", 14,014 steps.

For "he gave me was wrong :-)", 28,046 steps.

For "one he gave me was wrong :-)", 112,222 steps.

For "first one he gave me was wrong :-)", >1,000,000 steps.  

                        这篇关于正则表达式挂起程序（100％CPU使用率）的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

正则表达式挂起程序（100％CPU使用率） [英] Regular expression hangs program (100% CPU usage)

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

正则表达式挂起程序（100％CPU使用率） [英] Regular expression hangs program (100% CPU usage)

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭