使用RegEx来解析大字符串时的java.lang.StackOverflowError [英] java.lang.StackOverflowError while using a RegEx to Parse big strings
问题描述
这是我的正则表达式
((?:(?:'[^']*')|[^;])*)[;]
它以分号标记字符串。例如,
It tokenizes a string on semicolons. For example,
Hello world; I am having a problem; using regex;
结果是三个字符串
Hello world
I am having a problem
using regex
但是当我使用大输入字符串时,我收到此错误
But when I use a large input string I get this error
Exception in thread "main" java.lang.StackOverflowError
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
at java.util.regex.Pattern$BranchConn.match(Pattern.java:4078)
at java.util.regex.Pattern$CharProperty.match(Pattern.java:3345)
at java.util.regex.Pattern$Branch.match(Pattern.java:4114)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
这是如何引起的,我该如何解决?
How is this caused and how can I solve it?
推荐答案
不幸的是,Java的内置正则表达式支持包含重复备用路径的正则表达式存在问题(即(A | B)*
)。这被编译成递归调用,当在非常大的字符串上使用时会导致StackOverflow错误。
Unfortunately, Java's builtin regex support has problems with regexes containing repetitive alternative paths (that is, (A|B)*
). This is compiled into a recursive call, which results in a StackOverflow error when used on a very large string.
一种可能的解决方案是重写你的正则表达式而不使用repititive替代,但如果你的目标是用分号标记字符串,你不需要复杂的正则表达式实际上,只需使用 String.split(),带有一个简单的;
作为参数。
A possible solution is to rewrite your regex to not use a repititive alternative, but if your goal is to tokenize a string on semicolons, you don't need a complex regex at all really, just use String.split() with a simple ";"
as the argument.
这篇关于使用RegEx来解析大字符串时的java.lang.StackOverflowError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!