正则表达式和转义和未转义的分隔符 [英] Regex and escaped and unescaped delimiter
问题描述
与此相关的问题
我有一个字符串
a\; b \\; c; d
在Java中看起来像
String s =a\ \; b \\\\\; c; d
我需要拆分它分号符合以下规则:
-
如果分号前面加上反斜杠,则不应将其视为分隔符(
-
如果反斜杠本身被转义,因此不能自行转义分号,该分号应该是分隔符(在 b 和 c 之间)。
所以分号应该被当作分隔符,如果在它之前有一个零或偶数的反斜杠。
例如上面的例子,我想获取以下的字符串(双反斜杠java编译器):
a\; b \\
c
d
您可以使用正则表达式
(?: \\。 \\\] ++)*
匹配非转义分号之间的所有文本:
列表< String> matchList = new ArrayList< String>();
尝试{
模式regex = Pattern.compile((?:\\\\。| [^; \\\\] ++)*);
Matcher regexMatcher = regex.matcher(subjectString);
while(regexMatcher.find()){
matchList.add(regexMatcher.group());
}
说明:
(?:#匹配...
\\。#任意转义的字符
|#或...
[^; \\] ++#除分号或反斜杠之外的任何字符;占有性匹配
)*#重复任意次数。
所有权匹配( ++
)是重要的是避免由于嵌套量词而造成的灾难性回溯。
question related to this
I have a string
a\;b\\;c;d
which in Java looks like
String s = "a\\;b\\\\;c;d"
I need to split it by semicolon with following rules:
If semicolon is preceded by backslash, it should not be treated as separator (between a and b).
If backslash itself is escaped and therefore does not escape itself semicolon, that semicolon should be separator (between b and c).
So semicolon should be treated as separator if there is either zero or even number of backslashes before it.
For example above, I want to get following strings (double backslashes for java compiler):
a\;b\\
c
d
You can use the regex
(?:\\.|[^;\\]++)*
to match all text between unescaped semicolons:
List<String> matchList = new ArrayList<String>();
try {
Pattern regex = Pattern.compile("(?:\\\\.|[^;\\\\]++)*");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
matchList.add(regexMatcher.group());
}
Explanation:
(?: # Match either...
\\. # any escaped character
| # or...
[^;\\]++ # any character(s) except semicolon or backslash; possessive match
)* # Repeat any number of times.
The possessive match (++
) is important to avoid catastrophic backtracking because of the nested quantifiers.
这篇关于正则表达式和转义和未转义的分隔符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!