在单引号和双引号之间获取数据(特殊情况) [英] Getting data between single and double quotes (special case)
问题描述
我正在编写一个字符串解析器,用于解析文本文件中的所有字符串,字符串可以在单引号或双引号内,很简单吧?好吧不是真的.我写了一个正则表达式来匹配我想要的字符串.但它在大字符串上给了我 StackOverFlow
错误(我知道 java 对大字符串上的正则表达式并不是很好),这是正则表达式模式 (['"])(?:(?!\1|\\).|\\.)*\1
I am writing a String parser that I use to parse all strings from a text file, The strings can be inside single or double quotes, Pretty simple right? well not really. I wrote a regex to match strings how I want. but it's giving me StackOverFlow
error on big strings (I am aware java isn't really good with regex stuff on big strings), This is the regex pattern (['"])(?:(?!\1|\\).|\\.)*\1
这适用于我需要的所有字符串输入,但是只要有一个大字符串它就会抛出 StackOverFlow
错误,我已经阅读了基于此的类似问题,例如 this 建议使用 StringUtils.substringsBetween
,但在像 '""'
, "\\\""
This works good for all the string inputs that I need, but as soon as theres a big string it throws StackOverFlow
error, I have read similar questions based on this, such as this which suggests to use StringUtils.substringsBetween
, but that fails on strings like '""'
, "\\\""
所以我的问题是我应该怎么做才能解决这个问题?如果需要,我可以提供更多背景信息,请发表评论.
So my question is what should I do to solve this issue? I can provide more context if needed, Just comment.
测试答案后
代码:
public static void main(String[] args) {
final String regex = "'([^']*)'|\"(.*)\"";
final String string = "local b = { [\"\\\\\"] = \"\\\\\\\\\", [\"\\\"\"] = \"\\\\\\\"\", [\"\\b\"] = \"\\\\b\", [\"\\f\"] = \"\\\\f\", [\"\\n\"] = \"\\\\n\", [\"\\r\"] = \"\\\\r\", [\"\\t\"] = \"\\\\t\" }\n" +
"local c = { [\"\\\\/\"] = \"/\" }";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
}
输出:
Full match: "\\"] = "\\\\", ["\""] = "\\\"", ["\b"] = "\\b", ["\f"] = "\\f", ["\n"] = "\\n", ["\r"] = "\\r", ["\t"] = "\\t"
Group 1: null
Group 2: \\"] = "\\\\", ["\""] = "\\\"", ["\b"] = "\\b", ["\f"] = "\\f", ["\n"] = "\\n", ["\r"] = "\\r", ["\t"] = "\\t
Full match: "\\/"] = "/"
Group 1: null
Group 2: \\/"] = "/
它没有正确处理转义引号.
It's not handling the escaped quotes correctly.
推荐答案
我会尝试 没有 捕获引用类型/lookahead/backref 以提高性能.请参阅这个问题,了解引号字符串中的转义字符.它包含一个不错的答案,即展开.试试看
I would try without capture quote type/lookahead/backref to improve performance. See this question for escaped characters in quoted strings. It contains a nice answer that is unrolled. Try like
'[^\\']*(?:\\.[^\\']*)*'|"[^\\"]*(?:\\.[^\\"]*)*"
作为 Java 字符串:
As a Java String:
String regex = "'[^\\\\']*(?:\\\\.[^\\\\']*)*'|\"[^\\\\\"]*(?:\\\\.[^\\\\\"]*)*\"";
左侧处理单引号,右侧处理双引号字符串.如果您的音源中的任何一种都超过了另一种,则最好将其放在管道的左侧.
The left side handles single quoted, the right double quoted strings. If either kind overbalances the other in your source, put that preferably on the left side of the pipe.
在 regex101 上查看此演示(如果您需要捕获引号内的内容,使用组)
See this a demo at regex101 (if you need to capture what's inside the quotes, use groups)
这篇关于在单引号和双引号之间获取数据(特殊情况)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!