基于空格和java中匹配引号的正则表达式拆分字符串 [英] Regular Expression to Split String based on space and matching quotes in java

查看:92
本文介绍了基于空格和java中匹配引号的正则表达式拆分字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个字符串,我需要根据空格和完全匹配的引号进行拆分。

I have a String which i need to split based on the space and the exact matching quotes.

如果

string = "It is fun \"to write\" regular\"expression"

我希望得到的结果是:

有趣

常规

表达式

正式表达式我从哪里来做一些接近做的事情这是:

The regular expression from which i came to some thing close to do this was :

STRING_SPLIT_REGEXP = "[^\\s\"']+|\"([^\"]*)\"|'([^']*)'"

谢谢预先找到答案。

推荐答案

看来你刚从这个答案,但你可以看到它不使用 split 但是 find 匹配器的c $ c>方法 cla SS。此答案还会处理',其输入显示没有任何迹象。

It seems that you just used regex from this answer, but as you could see it doesn't use split but find method from Matcher class. Also this answer takes care of ' where your input shows no signs of it.

所以你可以改进这个正则表达式删除零件处理'这将使它看起来像

So you can improve this regex by removing parts handling ' which will make it look like

[^\\s\"]+|\"([^\"]*)\"

此外,由于您希望包含作为令牌的一部分,因此您无需在之间进行匹配在单独的组中,所以摆脱 \([^ \] *)\ part

Also since you want to include " as part of token then you don't need to place match from between " in separate group, so get rid of parenthesis in \"([^\"]*)\" part

[^\\s\"]+|\"[^\"]*\"

现在您需要做的就是添加没有收盘的情况,但你会得到字符串的结尾。所以将此正则表达式更改为

Now all you need to do is add case where there will be no closing ", but instead you will get end of string. So change this regex to

[^\\s\"]+|\"[^\"]*(\"|$)

此后你可以使用Matcher,在某处查找所有商店令牌,让我们说列表

After this you can just use Matcher, find all store tokens somewhere, lets say in List.

示例:

String data = "It is fun \"to write\" regular\"expression";
List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("[^\\s\"]+|\"[^\"]*(\"|$)");
Matcher regexMatcher = regex.matcher(data);
while (regexMatcher.find()) {
    System.out.println(regexMatcher.group());
    matchList.add(regexMatcher.group());
}

输出:

It
is
fun
"to write"
regular
"expression






更复杂处理句柄的表达式这个数据看起来像


More complex expression to handle handle this data can look like

String data = "It is fun \"to write\" regular \"expression";
for(String s : data.split("(?<!\\G)(?<=\\G[^\"]*(\"[^\"]{0,100000}\")?[^\"]*)((?<=\"(?!\\s))|\\s+|(?=\"))"))
    System.out.println(s);

但是这种方法过于复杂,然后编写自己的解析器。

but this approach is way overcomplicated then writing your own parser.

这样的解析器可能看起来像

Such parser could look like

public static List<String> parse(String data) {
    List<String> tokens = new ArrayList<String>();
    StringBuilder sb = new StringBuilder();
    boolean insideQuote = false;
    char previous = '\0';

    for (char ch : data.toCharArray()) {
        if (ch == ' ' && !insideQuote) {
            if (sb.length() > 0 && previous != '"')
                addTokenAndResetBuilder(sb, tokens);
        } else if (ch == '"') {
            if (insideQuote) {
                sb.append(ch);
                addTokenAndResetBuilder(sb, tokens);
            } else {
                addTokenAndResetBuilder(sb, tokens);
                sb.append(ch);
            }
            insideQuote = !insideQuote;
        } else {
            sb.append(ch);
        }
        previous = ch;
    }
    addTokenAndResetBuilder(sb, tokens);

    return tokens;
}

private static void addTokenAndResetBuilder(StringBuilder sb, List<String> list) {
    if (sb.length() > 0) {
        list.add(sb.toString());
        sb.delete(0, sb.length());
    }
}

用法

String data = "It is fun \"to write\" regular\"expression\"xxx\"yyy";
for (String s : parse(data))
    System.out.println(s);

这篇关于基于空格和java中匹配引号的正则表达式拆分字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆