拆分嵌套的字符串并保留引号 [英] Splitting a nested string keeping quotation marks

查看：60 发布时间：2021/5/18 19:42:04 java regex string

本文介绍了拆分嵌套的字符串并保留引号的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用Java进行项目，该项目需要嵌套字符串.

I am working on a project in Java that requires having nested strings.

对于纯文本形式的输入字符串，如下所示:

For an input string that in plain text looks like this:

这是字符串"，这是"\"嵌套\"字符串"

This is "a string" and this is "a \"nested\" string"

结果必须是以下内容:

[0] This
[1] is
[2] "a string"
[3] and
[4] this
[5] is
[6] "a \"nested\" string"

注意，我希望保留 \"序列.
我有以下方法:

Note that I want the \" sequences to be kept.
I have the following method:

public static String[] splitKeepingQuotationMarks(String s);

，我需要根据给定的规则，根据给定的 s 参数创建一个字符串数组，而无需使用 Java集合框架或其派生类.

and I need to create an array of strings out of the given s parameter by the given rules, without using the Java Collection Framework or its derivatives.

我不确定如何解决此问题.
可以制作一个正则表达式来解决这个问题吗?

I am unsure about how to solve this problem.
Can a regex expression be made that would get this solved?

根据评论中的问题进行更新:

每个未转义的"都有其结尾的未转义的" (它们是平衡的)
每个转义字符 \ (要创建表示 \ 的文本，我们需要将其写为 \\).

each unescaped " has its closing unescaped " (they are balanced)
each escaping character \ also must be escaped if we want to create literal representing it (to create text representing \ we need to write it as \\).

推荐答案

您可以使用以下正则表达式:

You can use the following regex:

"[^"\\]*(?:\\.[^"\\]*)*"|\S+

请参见 regex演示

Java演示:

String str = "This is \"a string\" and this is \"a \\\"nested\\\" string\""; 
Pattern ptrn = Pattern.compile("\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\"|\\S+");
Matcher matcher = ptrn.matcher(str);
while (matcher.find()) {
    System.out.println(matcher.group(0));
}

说明:

"[^" \\] *(?:\\.[^"\\] *)*" -双引号，后跟除<以外的任何0+字符code>"和 \ ( [^" \\] )，后跟0+个任意转义序列的序列( \\ .. )，后接除"和 \
| -或...
\ S + -1个或多个非空白字符

"[^"\\]*(?:\\.[^"\\]*)*" - a double quote that is followed with any 0+ characters other than a " and \ ([^"\\]) followed with 0+ sequences of any escaped sequence (\\.) followed with any 0+ characters other than a " and \
| - or...
\S+ - 1 or more non-whitespace characters

注意

@Pshemo的建议-"\"(?:\\\\.| [^ \])* \" | \\ S +"(或会更正确)-是相同的表达式，但效率要低得多，因为它使用了以 * 量化的交替组.由于正则表达式引擎必须测试每个位置，所以此构造涉及更多的回溯，每个位置有2个概率.我基于 unroll-the-loop 的版本可以一次匹配大块文本，因此更加快捷，可靠.

@Pshemo's suggestion - "\"(?:\\\\.|[^\"])*\"|\\S+" (or "\"(?:\\\\.|[^\"\\\\])*\"|\\S+" would be more correct) - is the same expression, but much less efficient since it is using an alternation group quantified with *. This construct involves much more backtracking as the regex engine has to test each position, and there are 2 probabilities for each position. My unroll-the-loop based version will match chunks of text at once, and is thus much faster and reliable.

更新

由于需要 String [] 类型作为输出，因此您需要分两个步骤完成操作:计算匹配项，创建数组，然后再次运行匹配器:

Since String[] type is required as output, you need to do it in 2 steps: count the matches, create the array, and then re-run the matcher again:

int cnt = 0;
String str = "This is \"a string\" and this is \"a \\\"nested\\\" string\""; 
Pattern ptrn = Pattern.compile("\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\"|\\S+");
Matcher matcher = ptrn.matcher(str);
while (matcher.find()) {
    cnt++;
}
System.out.println(cnt);
String[] result = new String[cnt];
matcher.reset();
int idx = 0;
while (matcher.find()) {
    result[idx] = matcher.group(0);
    idx++;
}
System.out.println(Arrays.toString(result));

请参见另一个IDEONE演示

这篇关于拆分嵌套的字符串并保留引号的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

拆分嵌套的字符串并保留引号 [英] Splitting a nested string keeping quotation marks

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

拆分嵌套的字符串并保留引号 [英] Splitting a nested string keeping quotation marks

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭