使用RegEx来解析大字符串时的java.lang.StackOverflowError [英] java.lang.StackOverflowError while using a RegEx to Parse big strings

查看:149
本文介绍了使用RegEx来解析大字符串时的java.lang.StackOverflowError的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的正则表达式

((?:(?:'[^']*')|[^;])*)[;]

它以分号标记字符串。例如,

It tokenizes a string on semicolons. For example,

Hello world; I am having a problem; using regex;

结果是三个字符串

Hello world
I am having a problem
using regex

但是当我使用大输入字符串时,我收到此错误

But when I use a large input string I get this error

Exception in thread "main" java.lang.StackOverflowError
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
at java.util.regex.Pattern$BranchConn.match(Pattern.java:4078)
at java.util.regex.Pattern$CharProperty.match(Pattern.java:3345)
at java.util.regex.Pattern$Branch.match(Pattern.java:4114)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)

这是如何引起的,我该如何解决?

How is this caused and how can I solve it?

推荐答案

不幸的是,Java的内置正则表达式支持包含重复备用路径的正则表达式存在问题(即(A | B)* )。这被编译成递归调用,当在非常大的字符串上使用时会导致StackOverflow错误。

Unfortunately, Java's builtin regex support has problems with regexes containing repetitive alternative paths (that is, (A|B)*). This is compiled into a recursive call, which results in a StackOverflow error when used on a very large string.

一种可能的解决方案是重写你的正则表达式而不使用repititive替代,但如果你的目标是用分号标记字符串,你不需要复杂的正则表达式实际上,只需使用 String.split(),带有一个简单的;作为参数。

A possible solution is to rewrite your regex to not use a repititive alternative, but if your goal is to tokenize a string on semicolons, you don't need a complex regex at all really, just use String.split() with a simple ";" as the argument.

这篇关于使用RegEx来解析大字符串时的java.lang.StackOverflowError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆