在引号外用逗号分隔 [英] Splitting on comma outside quotes
问题描述
我的程序从文件中读取一行。这一行包含逗号分隔的文字,如:
My program reads a line from a file. This line contains comma-seperated text like:
123,test,444,"don't split, this",more test,1
我希望拆分的结果如下:
I would like the result of a split to be this:
123
test
444
"don't split, this"
more test
1
如果我使用 String.split(,)
,我会得到这个:
If I use the String.split(",")
, I would get this:
123
test
444
"don't split
this"
more test
1
换句话说:子字符串中的逗号不分割,这个
不是分隔符。如何处理?
In other words: The comma in the substring "don't split, this"
is not a seperator. How to deal with this?
提前致谢..
Jakob
Thanks in advance.. Jakob
推荐答案
你可以尝试这个正则表达式:
You can try out this regex:
str.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)");
这会将字符串拆分为,
,后跟偶数个双引号。换句话说,它会在逗号上拆分如果你的字符串中有平衡的引号,这将有效。
This splits the string on ,
that is followed by an even number of double quotes. In other words, it splits on comma outside the double quotes. This will work provided you have balanced quotes in your string.
说明:
, // Split on comma
(?= // Followed by
(?: // Start a non-capture group
[^"]* // 0 or more non-quote characters
" // 1 quote
[^"]* // 0 or more non-quote characters
" // 1 quote
)* // 0 or more repetition of non-capture group (multiple of 2 quotes will be even)
[^"]* // Finally 0 or more non-quotes
$ // Till the end (This is necessary, else every comma will satisfy the condition)
)
您甚至可以在代码中使用(?x)
修饰符与您的正则表达式进行类似的输入。修饰符忽略了正则表达式中的任何空格,因此更容易读取分为多行的正则表达式,如下所示:
You can even type like this in your code, using (?x)
modifier with your regex. The modifier ignores any whitespaces in your regex, so it's becomes more easy to read a regex broken into multiple lines like so:
String[] arr = str.split("(?x) " +
", " + // Split on comma
"(?= " + // Followed by
" (?: " + // Start a non-capture group
" [^\"]* " + // 0 or more non-quote characters
" \" " + // 1 quote
" [^\"]* " + // 0 or more non-quote characters
" \" " + // 1 quote
" )* " + // 0 or more repetition of non-capture group (multiple of 2 quotes will be even)
" [^\"]* " + // Finally 0 or more non-quotes
" $ " + // Till the end (This is necessary, else every comma will satisfy the condition)
") " // End look-ahead
);
这篇关于在引号外用逗号分隔的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!