正则表达式和转义和未转义的分隔符 [英] Regex and escaped and unescaped delimiter

查看:182
本文介绍了正则表达式和转义和未转义的分隔符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

相关的问题



我有一个字符串



  a\; b \\; c; d 

在Java中看起来像

  String s =a\ \; b \\\\\; c; d

我需要拆分它分号符合以下规则:


  1. 如果分号前面加上反斜杠,则不应将其视为分隔符(


  2. 如果反斜杠本身被转义,因此不能自行转义分号,该分号应该是分隔符(在 b c 之间)。


所以分号应该被当作分隔符,如果在它之前有一个零或偶数的反斜杠。



例如上面的例子,我想获取以下的字符串(双反斜杠java编译器):

  a\; b \\ 
c
d


解决方案

您可以使用正则表达式

 (?: \\。 \\\] ++)* 

匹配非转义分号之间的所有文本:

 列表< String> matchList = new ArrayList< String>(); 
尝试{
模式regex = Pattern.compile((?:\\\\。| [^; \\\\] ++)*);
Matcher regexMatcher = regex.matcher(subjectString);
while(regexMatcher.find()){
matchList.add(regexMatcher.group());
}

说明:

 (?:#匹配... 
\\。#任意转义的字符
|#或...
[^; \\] ++#除分号或反斜杠之外的任何字符;占有性匹配
)*#重复任意次数。

所有权匹配( ++ )是重要的是避免由于嵌套量词而造成的灾难性回溯。


question related to this

I have a string

a\;b\\;c;d

which in Java looks like

String s = "a\\;b\\\\;c;d"

I need to split it by semicolon with following rules:

  1. If semicolon is preceded by backslash, it should not be treated as separator (between a and b).

  2. If backslash itself is escaped and therefore does not escape itself semicolon, that semicolon should be separator (between b and c).

So semicolon should be treated as separator if there is either zero or even number of backslashes before it.

For example above, I want to get following strings (double backslashes for java compiler):

a\;b\\
c
d

解决方案

You can use the regex

(?:\\.|[^;\\]++)*

to match all text between unescaped semicolons:

List<String> matchList = new ArrayList<String>();
try {
    Pattern regex = Pattern.compile("(?:\\\\.|[^;\\\\]++)*");
    Matcher regexMatcher = regex.matcher(subjectString);
    while (regexMatcher.find()) {
        matchList.add(regexMatcher.group());
    } 

Explanation:

(?:        # Match either...
 \\.       # any escaped character
|          # or...
 [^;\\]++  # any character(s) except semicolon or backslash; possessive match
)*         # Repeat any number of times.

The possessive match (++) is important to avoid catastrophic backtracking because of the nested quantifiers.

这篇关于正则表达式和转义和未转义的分隔符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆