正则表达式忽略引号之间的文本 [英] RegEx To Ignore Text Between Quotes

查看:168
本文介绍了正则表达式忽略引号之间的文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个正则表达式,它是 [\\.|\\;|\\?|\\!][\\s]
这用于拆分字符串.但我不希望它拆分 .;?! 如果它在引号中.

I have a Regex, which is [\\.|\\;|\\?|\\!][\\s]
This is used to split a string. But I don't want it to split . ; ? ! if it is in quotes.

推荐答案

我不使用 split 而是使用 Pattern &匹配器代替.

I'd not use split but Pattern & Matcher instead.

演示:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {

    public static void main(String[] args) {

        String text = "start. \"in quotes!\"; foo? \"more \\\" words\"; bar";

        String simpleToken = "[^.;?!\\s\"]+";

        String quotedToken =
                "(?x)             # enable inline comments and ignore white spaces in the regex         \n" +
                "\"               # match a double quote                                                \n" +
                "(                # open group 1                                                        \n" +
                "  \\\\.          #   match a backslash followed by any char (other than line breaks)   \n" +
                "  |              #   OR                                                                \n" +
                "  [^\\\\\r\n\"]  #   any character other than a backslash, line breaks or double quote \n" +
                ")                # close group 1                                                       \n" +
                "*                # repeat group 1 zero or more times                                   \n" +
                "\"               # match a double quote                                                \n";

        String regex = quotedToken + "|" + simpleToken;

        Matcher m = Pattern.compile(regex).matcher(text);

        while(m.find()) {
            System.out.println("> " + m.group());
        }
    }
}

产生:

> start
> "in quotes!"
> foo
> "more \" words"
> bar

如您所见,它还可以处理引号内的转义引号.

As you can see, it can also handle escaped quotes inside quoted tokens.

这篇关于正则表达式忽略引号之间的文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆