解析CSV字符串,而忽略各个列中的逗号 [英] Parsing a CSV string while ignoring commas inside the individual columns
问题描述
我正在尝试使用逗号分隔定界符csv字符串.
I am trying to split a csv string with comma as delimiter.
val string ="A,B,"Hi,There",C,D"
我不能使用string.split(",")
,因为它将把"Hi,There"
分成两个不同的列.我可以使用正则表达式解决此问题吗?我遇到了我不想使用的scala-csv parser
.我希望有一个更好的方法来解决这个问题.我知道这不是一个小问题.如果人们可以分享他们解决该问题的方法,将会很有帮助.
I cannot use string.split(",")
because it will split "Hi,There"
as two different columns. Can I use regex to solve this? I came around scala-csv parser
which I dont want to use. I hope there is a better method to solve this problem.I know this is not a trivial problem. It'll be helpful if people can share their approaches to solve this problem.
推荐答案
使用 uniVocity-parsers CsvParser代替手动解析. CSV比您想象的要难得多,并且涉及许多极端情况.您刚刚找到了一个.简而言之,您需要一个库来可靠地读取CSV.其他Scala项目(例如spark-csv)使用uniVocity-parsers
Use uniVocity-parsers CsvParser for that instead of parsing it by hand. CSV is much harder than you think and there are many corner cases to cover. You just found one. In short, you NEED a library to read CSV reliably. uniVocity-parsers is used by other Scala projects (e.g. spark-csv)
由于我不了解Scala,因此我将在此处使用纯Java编写示例,但是您会明白的:
I'll put an example using plain Java here, because I don't know Scala, but you'll get the idea:
public static void main(String ... args){
CsvParserSettings settings = new CsvParserSettings(); //many options here, check the documentation
CsvParser parser = new CsvParser(settings);
String[] row = parser.parseLine("A,B,\"Hi,There\",C,D");
for(String value : row){
System.out.println(value);
}
}
输出:
A
B
Hi,There
C
D
披露:我是这个图书馆的作者.它是开源且免费的(Apache V2.0许可证).
Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).
这篇关于解析CSV字符串,而忽略各个列中的逗号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!