匹配正则表达式时程序永远运行 [英] Program run forever when matching regex

查看:150
本文介绍了匹配正则表达式时程序永远运行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不知道为什么,但是当我尝试运行这个程序时,看起来该程序将永远运行。

I don't know why but when I try to run this program, it's looks like the program will run forever.

package fjr.test;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test3 {

    public static void main(String[] args){

        String regex = "dssdfsdfdsf wdasdads dadlkn mdsds .";

        Pattern p  = Pattern.compile("^([a-zA-Z]+ *)+$"); 

        Matcher match =  p.matcher(regex); 

        if(match.matches()){
            System.out.println("Yess"); 
        }else{
            System.out.println("No.. "); 
        }

        System.out.println("FINISH..."); 
    }

}

我需要做的是匹配包含一串只用空格分隔的单词的模式

What I need to do was to match the pattern that contain a bunch of word that only separated by spaces

推荐答案

您的程序可能遇到了所谓的灾难性的回溯

Your program has likely encountered what's called catastrophic backtracking.

快速复习:正则表达式如何工作:状态机始终从左到右读取,必要时回溯。

在左侧,我们有我们的模式:

On the left hand side, we have our pattern:

/^([a-zA-Z]+ *)+$/

这里是匹配的字符串:

dssdfsdfdsf wdasdads dadlkn mdsds .

从regex101调试器中,你的正则表达式需要78540步才能失败。这是因为您使用的是贪婪不占有(回溯)的量词。

From the regex101 debugger, your regex took 78540 steps to fail. This is because you used quantifiers that are greedy and not possessive (backtracking).

...长话短说,因为输入字符串无法匹配,正则表达式中的每个量词导致无限回溯 - 每个字符都从 + 中释放,然后 * 然后两个然后从()+ 释放一个组以回溯更多。

... Long story short, because the input string fails to match, every quantifier within your regex causes indefinite backtracking - Every character is released from + and then * and then both and then a group is released from ()+ to backtrack more.

这里有一些你应该遵循的解决方案:

Here's a few solutions you should follow:

如果您修改表达式,您将看到该模式在逻辑上与以下相同:

If you revise your expression, you'll see that the pattern is logically same as:

/^[a-zA-Z]+( +[a-zA-Z]+)*$/

这使用了一个步骤逻辑归纳以减少楼上的正则表达式以更快地匹配,现在在97步!

This uses a step of logical induction to reduce the regex upstairs to match far quicker, now at 97 steps!

正如我所提到的, / ^([a -zA-Z] + *)+ $ / 是邪恶的,因为它以可怕的方式回溯。我们是Java,我们能做什么?

As I mentioned, /^([a-zA-Z]+ *)+$/ is evil because it backtracks in a terrible manner. We're in Java, what can we do?

此解决方案仅适用于 [a-zA-Z] 匹配不同的项目。我们可以使用占有团体!

This solution works only because [a-zA-Z] and matches distinct items. We can use a possessive group!

/^([a-zA-Z]++ *+)++$/
            ^  ^  ^

这些简单的 + 表示如果我们从这里失败,我们就不会回溯。这是一个非常有效的解决方案,并且不再需要回溯。每当你有两个不同的组,其间有量词时,请使用它们。如果您需要一些有效性的证明,这是我们的记分卡:

These simple "+" denotes "We're not backtracking if we fail the match from here". This is an extremely effective solution, and cuts off any need for backtracking. Whenever you have two distinct groups with a quantifier in between, use them. And if you need some proof on the effectiveness, here's our scorecard:

另请阅读:

  • The Stack Overflow Regex Reference
  • ReDoS - Wikipedia

在线演示:

  • RegEx demo 1
  • RegEx demo 2

这篇关于匹配正则表达式时程序永远运行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆