OpenCSV CsvToBean:不带BOM的UTF-8无法读取第一列 [英] OpenCSV CsvToBean: First column not read for UTF-8 Without BOM

查看:176
本文介绍了OpenCSV CsvToBean:不带BOM的UTF-8无法读取第一列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用OpenCSV解析没有 BOM表的UTF-8文档会导致第一列无法读取.作为输入,输入相同的文档内容,但使用 BOM 的UTF-8编码可以正常工作.

Using OpenCSV to parse UTF-8 documents without BOM results in the first column not read. Giving as an input the same document content but encoded in UTF-8 with BOM works correctly.

我专门将字符集设置为UTF-8

I set specifically the charset to UTF-8

    fileInputStream = new FileInputStream(file);
    inputStreamReader = new InputStreamReader(fileInputStream, StandardCharsets.UTF_8);
    reader = new BufferedReader(inputStreamReader);
    HeaderColumnNameMappingStrategy<Bean> ms = new HeaderColumnNameMappingStrategy<Bean>();
    ms.setType(Bean.class);
    CsvToBean<Bean> csvToBean = new CsvToBeanBuilder<Bean>(reader).withType(Bean.class).withMappingStrategy(ms)
            .withSeparator(';').build();
    csvToBean.parse();

我创建了一个示例项目,可以在其中重现该问题: https://github.com/dajoropo/csv2beanSample

I've created a sample project where the issue can be reproduced: https://github.com/dajoropo/csv2beanSample

运行单元测试,您可以看到没有BOM且带有BOM的UTF-8文件如何失败.

Running the Unit Test you can see how the UTF-8 file without BOM fails and with BOM works correctly.

该错误来自第二个断言,因为第一列未读.结果:

The error comes in the second assertion, because the first column in not read. Result it:

[Bean [a = null ,b = second,c = third]]

[Bean [a=null, b=second, c=third]]

有任何提示吗?

推荐答案

如果我在您的项目中打开 Bean 类并搜索"B",那么我可以找到一个条目.如果我搜索"A",则不能:)这意味着您将带有BOM表头的A复制/粘贴到 Bean 类.BOM表头不可见,但仍会考虑在内.

If I open Bean class in you project and search for "B" then I can find one entry. If I search for "A" then I cannot :) It means you copy/pasted A with BOM header to Bean class. BOM header is not visible but still taken into account.

如果我修复"A",则另一个测试开始失败,但是我认为您可以使用 BOMInputStream 对其进行修复.

If I fix "A" then another test starts failing but I think you can fix it using BOMInputStream.

检查此问题并回答字节顺序标记将文件拧紧用Java阅读

这是已知问题.您可以使用Apache Commons IO的 BOMInputStream 来解决它.

It is known problem. You can use Apache Commons IO's BOMInputStream to solve it.

刚刚尝试

    <dependency>
        <groupId>commons-io</groupId>
        <artifactId>commons-io</artifactId>
        <version>2.6</version>
    </dependency>

        inputStreamReader = new InputStreamReader(new BOMInputStream(fileInputStream), StandardCharsets.UTF_8);

并修复

@CsvBindByName(column = "A")
private String a;

从"A"中排除前缀会使两个测试均通过

to exclude prefix from "A" makes both tests passing

这篇关于OpenCSV CsvToBean:不带BOM的UTF-8无法读取第一列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆