在单引号和双引号之间获取数据(特殊情况) [英] Getting data between single and double quotes (special case)

查看:67
本文介绍了在单引号和双引号之间获取数据(特殊情况)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个字符串解析器,用于解析文本文件中的所有字符串,字符串可以在单引号或双引号内,很简单吧?好吧不是真的.我写了一个正则表达式来匹配我想要的字符串.但它在大字符串上给了我 StackOverFlow 错误(我知道 java 对大字符串上的正则表达式并不是很好),这是正则表达式模式 (['"])(?:(?!\1|\\).|\\.)*\1

I am writing a String parser that I use to parse all strings from a text file, The strings can be inside single or double quotes, Pretty simple right? well not really. I wrote a regex to match strings how I want. but it's giving me StackOverFlow error on big strings (I am aware java isn't really good with regex stuff on big strings), This is the regex pattern (['"])(?:(?!\1|\\).|\\.)*\1

这适用于我需要的所有字符串输入,但是只要有一个大字符串它就会抛出 StackOverFlow 错误,我已经阅读了基于此的类似问题,例如 this 建议使用 StringUtils.substringsBetween,但在像 '""', "\\\""

This works good for all the string inputs that I need, but as soon as theres a big string it throws StackOverFlow error, I have read similar questions based on this, such as this which suggests to use StringUtils.substringsBetween, but that fails on strings like '""', "\\\""

所以我的问题是我应该怎么做才能解决这个问题?如果需要,我可以提供更多背景信息,请发表评论.

So my question is what should I do to solve this issue? I can provide more context if needed, Just comment.

测试答案后

代码:

public static void main(String[] args) {

    final String regex = "'([^']*)'|\"(.*)\"";
    final String string = "local b = { [\"\\\\\"] = \"\\\\\\\\\", [\"\\\"\"] = \"\\\\\\\"\", [\"\\b\"] = \"\\\\b\", [\"\\f\"] = \"\\\\f\", [\"\\n\"] = \"\\\\n\", [\"\\r\"] = \"\\\\r\", [\"\\t\"] = \"\\\\t\" }\n" +
            "local c = { [\"\\\\/\"] = \"/\" }";

    final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
    final Matcher matcher = pattern.matcher(string);

    while (matcher.find()) {
        System.out.println("Full match: " + matcher.group(0));
        for (int i = 1; i <= matcher.groupCount(); i++) {
            System.out.println("Group " + i + ": " + matcher.group(i));
        }
    }
}

输出:

Full match: "\\"] = "\\\\", ["\""] = "\\\"", ["\b"] = "\\b", ["\f"] = "\\f", ["\n"] = "\\n", ["\r"] = "\\r", ["\t"] = "\\t"
Group 1: null
Group 2: \\"] = "\\\\", ["\""] = "\\\"", ["\b"] = "\\b", ["\f"] = "\\f", ["\n"] = "\\n", ["\r"] = "\\r", ["\t"] = "\\t
Full match: "\\/"] = "/"
Group 1: null
Group 2: \\/"] = "/

它没有正确处理转义引号.

It's not handling the escaped quotes correctly.

推荐答案

我会尝试 没有 捕获引用类型/lookahead/backref 以提高性能.请参阅这个问题,了解引号字符串中的转义字符.它包含一个不错的答案,即展开.试试看

I would try without capture quote type/lookahead/backref to improve performance. See this question for escaped characters in quoted strings. It contains a nice answer that is unrolled. Try like

'[^\\']*(?:\\.[^\\']*)*'|"[^\\"]*(?:\\.[^\\"]*)*"

作为 Java 字符串:

As a Java String:

String regex = "'[^\\\\']*(?:\\\\.[^\\\\']*)*'|\"[^\\\\\"]*(?:\\\\.[^\\\\\"]*)*\"";

左侧处理单引号,右侧处理双引号字符串.如果您的音源中的任何一种都超过了另一种,则最好将其放在管道的左侧.

The left side handles single quoted, the right double quoted strings. If either kind overbalances the other in your source, put that preferably on the left side of the pipe.

在 regex101 上查看此演示(如果您需要捕获引号内的内容,使用组)

See this a demo at regex101 (if you need to capture what's inside the quotes, use groups)

这篇关于在单引号和双引号之间获取数据(特殊情况)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆