在引号外的逗号上拆分 [英] Splitting on comma outside quotes

查看:24
本文介绍了在引号外的逗号上拆分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的程序从文件中读取一行.此行包含逗号分隔的文本,例如:

My program reads a line from a file. This line contains comma-separated text like:

123,test,444,"don't split, this",more test,1

我希望拆分的结果是这样的:

I would like the result of a split to be this:

123
test
444
"don't split, this"
more test
1

如果我使用 String.split(","),我会得到这个:

If I use the String.split(","), I would get this:

123
test
444
"don't split
 this"
more test
1

换句话说:子串 "don't split, this" 中的逗号不是分隔符.如何处理?

In other words: The comma in the substring "don't split, this" is not a separator. How to deal with this?

推荐答案

你可以试试这个正则表达式:

You can try out this regex:

str.split(",(?=(?:[^"]*"[^"]*")*[^"]*$)");

这会拆分 , 上的字符串,后跟偶数个双引号.换句话说,它在双引号外以逗号分隔.如果您的字符串中有平衡引号,这将起作用.

This splits the string on , that is followed by an even number of double quotes. In other words, it splits on comma outside the double quotes. This will work provided you have balanced quotes in your string.

说明:

,           // Split on comma
(?=         // Followed by
   (?:      // Start a non-capture group
     [^"]*  // 0 or more non-quote characters
     "      // 1 quote
     [^"]*  // 0 or more non-quote characters
     "      // 1 quote
   )*       // 0 or more repetition of non-capture group (multiple of 2 quotes will be even)
   [^"]*    // Finally 0 or more non-quotes
   $        // Till the end  (This is necessary, else every comma will satisfy the condition)
)

您甚至可以在代码中键入这样的内容,在正则表达式中使用 (?x) 修饰符.修饰符会忽略正则表达式中的任何空格,因此更容易阅读分成多行的正则表达式,如下所示:

You can even type like this in your code, using (?x) modifier with your regex. The modifier ignores any whitespaces in your regex, so it's becomes more easy to read a regex broken into multiple lines like so:

String[] arr = str.split("(?x)   " + 
                     ",          " +   // Split on comma
                     "(?=        " +   // Followed by
                     "  (?:      " +   // Start a non-capture group
                     "    [^"]* " +   // 0 or more non-quote characters
                     "    "     " +   // 1 quote
                     "    [^"]* " +   // 0 or more non-quote characters
                     "    "     " +   // 1 quote
                     "  )*       " +   // 0 or more repetition of non-capture group (multiple of 2 quotes will be even)
                     "  [^"]*   " +   // Finally 0 or more non-quotes
                     "  $        " +   // Till the end  (This is necessary, else every comma will satisfy the condition)
                     ")          "     // End look-ahead
                         );

这篇关于在引号外的逗号上拆分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆