在引号外用逗号分隔 [英] Splitting on comma outside quotes

查看:125
本文介绍了在引号外用逗号分隔的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的程序从文件中读取一行。这一行包含逗号分隔的文字,如:

My program reads a line from a file. This line contains comma-seperated text like:

123,test,444,"don't split, this",more test,1

我希望拆分的结果如下:

I would like the result of a split to be this:

123
test
444
"don't split, this"
more test
1

如果我使用 String.split(,),我会得到这个:

If I use the String.split(","), I would get this:

123
test
444
"don't split
 this"
more test
1

换句话说:子字符串中的逗号不分割,这个不是分隔符。如何处理?

In other words: The comma in the substring "don't split, this" is not a seperator. How to deal with this?

提前致谢..
Jakob

Thanks in advance.. Jakob

推荐答案

你可以尝试这个正则表达式:

You can try out this regex:

str.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)");

这会将字符串拆分为,后跟偶数个双引号。换句话说,它会在逗号上拆分如果你的字符串中有平衡的引号,这将有效。

This splits the string on , that is followed by an even number of double quotes. In other words, it splits on comma outside the double quotes. This will work provided you have balanced quotes in your string.

说明:

,           // Split on comma
(?=         // Followed by
   (?:      // Start a non-capture group
     [^"]*  // 0 or more non-quote characters
     "      // 1 quote
     [^"]*  // 0 or more non-quote characters
     "      // 1 quote
   )*       // 0 or more repetition of non-capture group (multiple of 2 quotes will be even)
   [^"]*    // Finally 0 or more non-quotes
   $        // Till the end  (This is necessary, else every comma will satisfy the condition)
)

您甚至可以在代码中使用(?x)修饰符与您的正则表达式进行类似的输入。修饰符忽略了正则表达式中的任何空格,因此更容易读取分为多行的正则表达式,如下所示:

You can even type like this in your code, using (?x) modifier with your regex. The modifier ignores any whitespaces in your regex, so it's becomes more easy to read a regex broken into multiple lines like so:

String[] arr = str.split("(?x)   " + 
                     ",          " +   // Split on comma
                     "(?=        " +   // Followed by
                     "  (?:      " +   // Start a non-capture group
                     "    [^\"]* " +   // 0 or more non-quote characters
                     "    \"     " +   // 1 quote
                     "    [^\"]* " +   // 0 or more non-quote characters
                     "    \"     " +   // 1 quote
                     "  )*       " +   // 0 or more repetition of non-capture group (multiple of 2 quotes will be even)
                     "  [^\"]*   " +   // Finally 0 or more non-quotes
                     "  $        " +   // Till the end  (This is necessary, else every comma will satisfy the condition)
                     ")          "     // End look-ahead
                         );

这篇关于在引号外用逗号分隔的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆