用逗号分隔一个字符串,但避免转义逗号和反斜杠 [英] split a string at comma but avoid escaped comma and backslash
问题描述
我想用逗号分开一个字符串。该字符串包含转义的逗号
\,
和转义反斜杠\\
。开头和结尾的逗号以及连续的几个逗号应该会导致空字符串。
I'd like to split a string at comma ","
. The string contains escaped commas "\,"
and escaped backslashs "\\"
. Commas at the beginning and end as well as several commas in a row should lead to empty strings.
所以,, \,\ \ ,,
应该成为,
,
\,\\
,,
So ",,\,\\,,"
should become ""
, ""
, "\,\\"
, ""
, ""
请注意,我的示例字符串显示为单个\
的反斜杠。 Java字符串会使它们翻倍。
Note that my example strings show backslash as single "\"
. Java strings would have them doubled.
我尝试过几个包,但没有成功。我的最后一个想法是写我自己的解析器。
I tried with several packages but had no success. My last idea would be to write my own parser.
推荐答案
虽然一个专门的库是一个好主意以下将工作
While certainly a dedicated library is a good idea the following will work
public static String[] splitValues(final String input) {
final ArrayList<String> result = new ArrayList<String>();
// (?:\\\\)* matches any number of \-pairs
// (?<!\\) ensures that the \-pairs aren't preceded by a single \
final Pattern pattern = Pattern.compile("(?<!\\\\)(?:\\\\\\\\)*,");
final Matcher matcher = pattern.matcher(input);
int previous = 0;
while (matcher.find()) {
result.add(input.substring(previous, matcher.end() - 1));
previous = matcher.end();
}
result.add(input.substring(previous, input.length()));
return result.toArray(new String[result.size()]);
}
想法是找到,
以没有或偶数编号的 \
(即不转义,
)以及$ code>,是在 end() - 1
之前的模式的最后一部分,它刚刚在之前, / code>。
Idea is to find ,
prefixed by no or even-numbered \
(i.e. not escaped ,
) and as the ,
is the last part of the pattern cut at end()-1
which is just before the ,
.
除了 null
- 输入。如果你喜欢处理 List< String>
更好,你当然可以改变回报;我刚刚采用在 split()
中实现的模式来处理转义。
Function is tested against most odds I can think of except for null
-input. If you like handling List<String>
better you can of course change the return; I just adopted the pattern implemented in split()
to handle escapes.
示例类使用此功能: p>
Example class uitilizing this function:
import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Print {
public static void main(final String[] args) {
String input = ",,\\,\\\\,,";
final String[] strings = splitValues(input);
System.out.print("\""+input+"\" => ");
printQuoted(strings);
}
public static String[] splitValues(final String input) {
final ArrayList<String> result = new ArrayList<String>();
// (?:\\\\)* matches any number of \-pairs
// (?<!\\) ensures that the \-pairs aren't preceded by a single \
final Pattern pattern = Pattern.compile("(?<!\\\\)(?:\\\\\\\\)*,");
final Matcher matcher = pattern.matcher(input);
int previous = 0;
while (matcher.find()) {
result.add(input.substring(previous, matcher.end() - 1));
previous = matcher.end();
}
result.add(input.substring(previous, input.length()));
return result.toArray(new String[result.size()]);
}
public static void printQuoted(final String[] strings) {
if (strings.length > 0) {
System.out.print("[\"");
System.out.print(strings[0]);
for(int i = 1; i < strings.length; i++) {
System.out.print("\", \"");
System.out.print(strings[i]);
}
System.out.println("\"]");
} else {
System.out.println("[]");
}
}
}
这篇关于用逗号分隔一个字符串,但避免转义逗号和反斜杠的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!