我的正则表达式导致 Java 中的堆栈溢出;我错过了什么? [英] My regex is causing a stack overflow in Java; what am I missing?

查看:58
本文介绍了我的正则表达式导致 Java 中的堆栈溢出;我错过了什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将正则表达式与 Scanner 配合使用以匹配文件中的字符串.正则表达式适用于文件的所有内容,除了这一行:

I am attempting to use a regular expression with Scanner to match a string from a file. The regex works with all of the contents of the file except for this line:

DNA="ITTTAITATIATYAAAYIYI[....]ITYTYITTIYAIAIYIT"

在实际文件中,省略号代表几千个字符.

in the actual file, the ellipsis represents several thousand more characters.

当读取文件的循环到达包含基数的行时,发生堆栈溢出错误.

When the loop that reads the file arrives on the line containing the bases, a stack overflow error occurs.

这是循环:

while (scanFile.hasNextLine()) {
   final String currentLine = scanFile.findInLine(".*");
   System.out.println("trying to match '" + currentLine + "'");
   Scanner internalScanner = new Scanner(currentLine);
   String matchResult = internalScanner.findInLine(Constants.ANIMAL_INFO_REGEX);
   assert matchResult != null : "there's no reason not to find a match"; 
   matches.put(internalScanner.match().group(1), internalScanner.match().group(2));
   scanFile.nextLine();
  }

和正则表达式:

static final String ANIMAL_INFO_REGEX = "([a-zA-Z]+) *= *\"(([a-zA-Z_.]| |\\.)+)";

这里是故障跟踪:

java.lang.StackOverflowError
    at java.util.regex.Pattern$CharProperty.match(Pattern.java:3360)
    at java.util.regex.Pattern$Branch.match(Pattern.java:4131)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4185)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4312)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4244)
    at java.util.regex.Pattern$BranchConn.match(Pattern.java:4095)
    at java.util.regex.Pattern$CharProperty.match(Pattern.java:3362)
    at java.util.regex.Pattern$Branch.match(Pattern.java:4131)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4185)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4312)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4244)
    at java.util.regex.Pattern$BranchConn.match(Pattern.java:4095)
    at java.util.regex.Pattern$CharProperty.match(Pattern.java:3362)
    at java.util.regex.Pattern$Branch.match(Pattern.java:4131)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4185)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4312)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4244)
    at java.util.regex.Pattern$BranchConn.match(Pattern.java:4095)
    ...etc (it's all regex).

非常感谢!

推荐答案

这看起来像 错误 5050507 .我同意 Asaph 的观点,即移除交替应该会有所帮助;该错误特别指出尽可能避免交替".我认为你可以更简单:

This looks like bug 5050507 . I agree with Asaph that removing the alternation should help; the bug specifically says "Avoid alternation whenever possible". I think you can go probably even simpler:

"^([a-zA-Z]+) *= *\"([^\"]+)"

这篇关于我的正则表达式导致 Java 中的堆栈溢出;我错过了什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆