使用Commons CSV进行CSV解析-引起IOException的引号中的引号 [英] CSV parsing with Commons CSV - Quotes within quotes causing IOException
问题描述
我正在使用通用CSV 来解析与电视相关的CSV内容节目.其中一个节目的节目名称带有双引号;
I am using Commons CSV to parse CSV content relating to TV shows. One of the shows has a show name which includes double quotes;
116,6,2,29 Sep 10,""JJ" (60 min)","http://www.tvmaze.com/episodes/4855/criminal-minds-6x02-jj"
显示名称为"JJ"(60分钟),该名称已用双引号引起来.这将引发IOException java.io.IOException :(第1行)封装的令牌和定界符之间的无效char.
The showname is "JJ" (60 min) which is already in double quotes. This is throwing an IOException java.io.IOException: (line 1) invalid char between encapsulated token and delimiter.
ArrayList<String> allElements = new ArrayList<String>();
CSVFormat csvFormat = CSVFormat.DEFAULT;
CSVParser csvFileParser = new CSVParser(new StringReader(line), csvFormat);
List<CSVRecord> csvRecords = null;
csvRecords = csvFileParser.getRecords();
for (CSVRecord record : csvRecords) {
int length = record.size();
for (int x = 0; x < length; x++) {
allElements.add(record.get(x));
}
}
csvFileParser.close();
return allElements;
CSVFormat.DEFAULT已设置withQuote('')
CSVFormat.DEFAULT already sets withQuote('"')
我认为此CSV的格式不正确,应设置为"JJ"(60分钟)"" JJ(60分钟)"-但有一种方法可以使公用CSV处理或执行此操作我需要手动修复此条目吗?
I think that this CSV is not properly formatted as ""JJ" (60 min)" should be """JJ"" (60 min)" - but is there a way to get commons CSV to handle this or do I need to fix this entry manually?
其他信息:其他节目名称在CSV条目中包含空格和逗号,并放在双引号中.
Additional information: Other show names contain spaces and commas within the CSV entry and are placed within double quotes.
推荐答案
此处的问题是引号未正确转义.您的解析器无法处理该问题.尝试 univocity-parsers ,因为这是Java的唯一解析器,我知道它可以处理未转义的引号引用值内.它也比Commons CSV快4倍.尝试以下代码:
The problem here is that the quotes are not properly escaped. Your parser doesn't handle that. Try univocity-parsers as this is the only parser for java I know that can handle unescaped quotes inside a quoted value. It is also 4 times faster than Commons CSV. Try this code:
//configure the parser to handle your situation
CsvParserSettings settings = new CsvParserSettings();
settings.setUnescapedQuoteHandling(STOP_AT_CLOSING_QUOTE);
//create the parser
CsvParser parser = new CsvParser(settings);
//parse your line
String[] out = parser.parseLine("116,6,2,29 Sep 10,\"\"JJ\" (60 min)\",\"http://www.tvmaze.com/episodes/4855/criminal-minds-6x02-jj\"");
for(String e : out){
System.out.println(e);
}
这将打印:
116
6
2
29 Sep 10
"JJ" (60 min)
http://www.tvmaze.com/episodes/4855/criminal-minds-6x02-jj
希望有帮助.
披露:我是该库的作者,它是开源的,并且免费(Apache 2.0许可证)
Disclosure: I'm the author of this library, it's open source and free (Apache 2.0 license)
这篇关于使用Commons CSV进行CSV解析-引起IOException的引号中的引号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!