使用Commons CSV进行CSV解析-引起IOException的引号中的引号 [英] CSV parsing with Commons CSV - Quotes within quotes causing IOException

查看:1265
本文介绍了使用Commons CSV进行CSV解析-引起IOException的引号中的引号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用通用CSV 来解析与电视相关的CSV内容节目.其中一个节目的节目名称带有双引号;

I am using Commons CSV to parse CSV content relating to TV shows. One of the shows has a show name which includes double quotes;

9月10日116,6,2,29," JJ(60分钟)","

116,6,2,29 Sep 10,""JJ" (60 min)","http://www.tvmaze.com/episodes/4855/criminal-minds-6x02-jj"

显示名称为"JJ"(60分钟),该名称已用双引号引起来.这将引发IOException java.io.IOException :(第1行)封装的令牌和定界符之间的无效char.

The showname is "JJ" (60 min) which is already in double quotes. This is throwing an IOException java.io.IOException: (line 1) invalid char between encapsulated token and delimiter.

    ArrayList<String> allElements = new ArrayList<String>();
    CSVFormat csvFormat = CSVFormat.DEFAULT;
    CSVParser csvFileParser = new CSVParser(new StringReader(line), csvFormat);

    List<CSVRecord> csvRecords = null;

    csvRecords = csvFileParser.getRecords();

    for (CSVRecord record : csvRecords) {
        int length = record.size();
        for (int x = 0; x < length; x++) {
            allElements.add(record.get(x));
        }
    }

    csvFileParser.close();
    return allElements;

CSVFormat.DEFAULT已设置withQuote('')

CSVFormat.DEFAULT already sets withQuote('"')

我认为此CSV的格式不正确,应设置为"JJ"(60分钟)"" JJ(60分钟)"-但有一种方法可以使公用CSV处理或执行此操作我需要手动修复此条目吗?

I think that this CSV is not properly formatted as ""JJ" (60 min)" should be """JJ"" (60 min)" - but is there a way to get commons CSV to handle this or do I need to fix this entry manually?

其他信息:其他节目名称在CSV条目中包含空格和逗号,并放在双引号中.

Additional information: Other show names contain spaces and commas within the CSV entry and are placed within double quotes.

推荐答案

此处的问题是引号未正确转义.您的解析器无法处理该问题.尝试 univocity-parsers ,因为这是Java的唯一解析器,我知道它可以处理未转义的引号引用值内.它也比Commons CSV快4倍.尝试以下代码:

The problem here is that the quotes are not properly escaped. Your parser doesn't handle that. Try univocity-parsers as this is the only parser for java I know that can handle unescaped quotes inside a quoted value. It is also 4 times faster than Commons CSV. Try this code:

//configure the parser to handle your situation
CsvParserSettings settings = new CsvParserSettings();
settings.setUnescapedQuoteHandling(STOP_AT_CLOSING_QUOTE);

//create the parser
CsvParser parser = new CsvParser(settings);

//parse your line
String[] out = parser.parseLine("116,6,2,29 Sep 10,\"\"JJ\" (60 min)\",\"http://www.tvmaze.com/episodes/4855/criminal-minds-6x02-jj\"");

for(String e : out){
    System.out.println(e);
}

这将打印:

116
6
2
29 Sep 10
"JJ" (60 min)
http://www.tvmaze.com/episodes/4855/criminal-minds-6x02-jj

希望有帮助.

披露:我是该库的作者,它是开源的,并且免费(Apache 2.0许可证)

Disclosure: I'm the author of this library, it's open source and free (Apache 2.0 license)

这篇关于使用Commons CSV进行CSV解析-引起IOException的引号中的引号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆