正则表达式回溯直到Java中的溢出 [英] Regular Expression backtracks until overflow in Java

查看:250
本文介绍了正则表达式回溯直到Java中的溢出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下表达式:

^(#ifdef FEATURE)+?\s*$((\r\n.*?)*^(#endif)+\s*[\/\/]*\s*(end of)*\s*FEATURE)+?$

运行我编译的.Jar文件时覆盖匹配缓冲区。

Overrides the matching buffer when running my compiled .Jar file.

匹配字符串可以类似于:

The matching string can be similar to:


这是一条垃圾线

this is a junk line

#ifdef功能

#endif //功能结束

这是一个垃圾line

this is a junk line

#ifdef FEATURE

这是一条垃圾线应该匹配:HOLasduiqwhei& // FEATURE fjfefj
#endif // h

#endif FEATURE

这是垃圾行

因此,粗体字符串应匹配。错误如下:

So, the bold strings should match. The error is as follows:

   at java.util.regex.Pattern$GroupHead.match(Unknown Source)
   at java.util.regex.Pattern$Loop.match(Unknown Source)
   at java.util.regex.Pattern$GroupTail.match(Unknown Source)
   at java.util.regex.Pattern$Curly.match1(Unknown Source)
   at java.util.regex.Pattern$Curly.match(Unknown Source)
   at java.util.regex.Pattern$Slice.match(Unknown Source)
   at java.util.regex.Pattern$GroupHead.match(Unknown Source)
   at java.util.regex.Pattern$Loop.match(Unknown Source)
   at java.util.regex.Pattern$GroupTail.match(Unknown Source)
   at java.util.regex.Pattern$Curly.match1(Unknown Source)
   at java.util.regex.Pattern$Curly.match(Unknown Source)
   at java.util.regex.Pattern$Slice.match(Unknown Source)
   at java.util.regex.Pattern$GroupHead.match(Unknown Source)
   at java.util.regex.Pattern$Loop.match(Unknown Source)
   at java.util.regex.Pattern$GroupTail.match(Unknown Source)
   at java.util.regex.Pattern$Curly.match1(Unknown Source)
   at java.util.regex.Pattern$Curly.match(Unknown Source)
   at java.util.regex.Pattern$Slice.match(Unknown Source)
   at java.util.regex.Pattern$GroupHead.match(Unknown Source)
   at java.util.regex.Pattern$Loop.match(Unknown Source)
   at java.util.regex.Pattern$GroupTail.match(Unknown Source)
   at java.util.regex.Pattern$Curly.match1(Unknown Source)
   at java.util.regex.Pattern$Curly.match(Unknown Source)
   at java.util.regex.Pattern$Slice.match(Unknown Source)
   at java.util.regex.Pattern$GroupHead.match(Unknown Source)
   at java.util.regex.Pattern$Loop.match(Unknown Source)
   at java.util.regex.Pattern$GroupTail.match(Unknown Source)
   at java.util.regex.Pattern$Curly.match1(Unknown Source)
   at java.util.regex.Pattern$Curly.match(Unknown Source)
   at java.util.regex.Pattern$Slice.match(Unknown Source)

任何回溯避免策略/改进表达式都是受欢迎的。我已经尝试过原子组(?>)但由于某种原因没有简化。

Any backtracking avoiding strategy/improvement of the expression is welcome. I have tried the atomic groups (?>) but doesn't simplify, for some reason.

代码如下:

public String strip(String text){

public String strip(String text) {

    ArrayList<String> patterns=new ArrayList<String>();
    patterns=readFile("Disabled_Features.txt");
    for(int i = 0; i < patterns.size(); ++i)
    {

      Pattern todoPattern = Pattern.compile("^#ifdef "+patterns.get(i)+"((?:\\r?\\n(?!#endif (?:// end of )?"+patterns.get(i)+"$).*)*)\\r?\\n#endif (?:// end of )?"+patterns.get(i)+"$",Pattern.MULTILINE); 

      Matcher m = todoPattern.matcher(text);
      text = m.replaceAll("");
    }
    return text;        
}


推荐答案

我试过编写的代码@Wiktor,效果很好

I have tried the code written by @Wiktor and works pretty well

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class TestRegex {
  public static void main(String[] args) {
    String text = "this is a junk line\n" + 
        "\n" + 
        "#ifdef FEATURE \n" + 
        "#endif // end of FEATURE\n" + 
        "\n" + 
        "this is a junk line\n" + 
        "\n" + 
        "#ifdef FEATURE\n" + 
        "\n" + 
        "this is a junk line that should be matched: HOLasduiqwhei & // FEATURE fjfefj #endif // h\n" + 
        "\n" + 
        "#endif FEATURE\n" + 
        "\n" + 
        "this is a junk line";

    // this version does not use Pattern.MULTILINE, this should reduce the backtraking
    Matcher matcher2 = Pattern.compile("\\n#ifdef FEATURE((?:\\r?\\n(?!#endif (?:// end of )?FEATURE).*)*)\\r?\\n#endif (?:// end of )?FEATURE").matcher(text);
    while (matcher2.find()) {
      System.out.println(matcher2.group());
    }

  }
}

这让我认为你的问题是由于输入文件的大小。

This let me to think that your problem is due to the size of input file.

所以,如果你的文件太大,你可以将输入实现为 CharSequence ,这样你就可以包装你的大文本文件了。为什么?因为从模式构建匹配器需要 CharSequence 一个论点。

So, in case your file is too big, you could implement the input as a CharSequence, so you could wrap your large text files. Why? Because building a Matcher from a Pattern takes a CharSequence as an argument.

https://github.com/fge/largetext

这篇关于正则表达式回溯直到Java中的溢出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆