解析CSV字符串,而忽略各个列中的逗号 [英] Parsing a CSV string while ignoring commas inside the individual columns

查看:140
本文介绍了解析CSV字符串,而忽略各个列中的逗号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用逗号分隔定界符csv字符串.

I am trying to split a csv string with comma as delimiter.

val string ="A,B,"Hi,There",C,D"

我不能使用string.split(","),因为它将把"Hi,There"分成两个不同的列.我可以使用正则表达式解决此问题吗?我遇到了我不想使用的scala-csv parser.我希望有一个更好的方法来解决这个问题.我知道这不是一个小问题.如果人们可以分享他们解决该问题的方法,将会很有帮助.

I cannot use string.split(",") because it will split "Hi,There" as two different columns. Can I use regex to solve this? I came around scala-csv parser which I dont want to use. I hope there is a better method to solve this problem.I know this is not a trivial problem. It'll be helpful if people can share their approaches to solve this problem.

推荐答案

使用 uniVocity-parsers CsvParser代替手动解析. CSV比您想象的要难得多,并且涉及许多极端情况.您刚刚找到了一个.简而言之,您需要一个库来可靠地读取CSV.其他Scala项目(例如spark-csv)使用uniVocity-parsers

Use uniVocity-parsers CsvParser for that instead of parsing it by hand. CSV is much harder than you think and there are many corner cases to cover. You just found one. In short, you NEED a library to read CSV reliably. uniVocity-parsers is used by other Scala projects (e.g. spark-csv)

由于我不了解Scala,因此我将在此处使用纯Java编写示例,但是您会明白的:

I'll put an example using plain Java here, because I don't know Scala, but you'll get the idea:

public static void main(String ... args){
    CsvParserSettings settings = new CsvParserSettings(); //many options here, check the documentation
    CsvParser parser = new CsvParser(settings);
    String[] row = parser.parseLine("A,B,\"Hi,There\",C,D");
    for(String value : row){
        System.out.println(value);
    }
}

输出:

A
B
Hi,There
C
D

披露:我是这个图书馆的作者.它是开源且免费的(Apache V2.0许可证).

Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).

这篇关于解析CSV字符串,而忽略各个列中的逗号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆