使用OpenCSV解析包含Unicode字符的CSV文件 [英] Parse CSV file containing a Unicode character using OpenCSV

查看:582
本文介绍了使用OpenCSV解析包含Unicode字符的CSV文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用NetBeans 6.0.1中的 OpenCSV 解析.csv文件。我的文件包含一些Unicode字符。当我在输出中写入时,角色以其他形式出现,如(HJ1'-E /;)。当我在记事本中打开此文件时,它看起来没问题。

I'm trying to parse a .csv file with OpenCSV in NetBeans 6.0.1. My file contains some Unicode character. When I write it in output the character appears in other form, like (HJ1'-E/;). When when I open this file in Notepad, it looks ok.

我使用的代码:

CSVReader reader=new CSVReader(new FileReader("d:\\a.csv"),',','\'',1);
    String[] line;
    while((line=reader.readNext())!=null){
        StringBuilder stb=new StringBuilder(400);
        for(int i=0;i<line.length;i++){
            stb.append(line[i]);
            stb.append(";");
        }
        System.out.println( stb);
    }


推荐答案

首先你需要知道什么编码您的文件,例如UTF-8或UTF-16。是什么生成这个文件开始?

First you need to know what encoding your file is in, such as UTF-8 or UTF-16. What's generating this file to start with?

之后,它相对简单 - 你需要创建一个包含在中的 FileInputStream InputStreamReader 而不仅仅是 FileReader 。 ( FileReader 始终使用系统的默认编码。)指定创建 InputStreamReader 时要使用的编码,以及如果你选择了正确的,那么一切都应该开始工作了。

After that, it's relatively straightforward - you need to create a FileInputStream wrapped in an InputStreamReader instead of just a FileReader. (FileReader always uses the default encoding for the system.) Specify the encoding to use when you create the InputStreamReader, and if you've picked the right one, everything should start working.

请注意,你不需要使用OpenCSV来检查这一点 - 你可以只阅读文本你自己的文件并打印出来。我不确定我是否相信 System.out 能够处理非ASCII字符 - 你可能想找到一种不同的方法来检查字符串,例如将字符的各个值打印为整数(最好是十六进制),然后将它们与unicode.org上的图表进行比较 。另一方面,你可以尝试正确的编码,看看开始时会发生什么...

Note that you don't need to use OpenCSV to check this - you could just read the text of the file yourself and print it all out. I'm not sure I'd trust System.out to be able to handle non-ASCII characters though - you may want to find a different way of examining strings, such as printing out the individual values of characters as integers (preferably in hex) and then comparing them with the charts at unicode.org. On the other hand, you could try the right encoding and see what happens to start with...

编辑:好的,所以如果你使用的是UTF-8:

Okay, so if you're using UTF-8:

CSVReader reader=new CSVReader(
    new InputStreamReader(new FileInputStream("d:\\a.csv"), "UTF-8"), 
    ',', '\'', 1);
String[] line;
while ((line = reader.readNext()) != null) {
    StringBuilder stb = new StringBuilder(400);
    for (int i = 0; i < line.length; i++) {
         stb.append(line[i]);
         stb.append(";");
    }
    System.out.println(stb);
}

(我希望你有一个try / finally块来关闭你的文件真实代码。)

(I hope you have a try/finally block to close the file in your real code.)

这篇关于使用OpenCSV解析包含Unicode字符的CSV文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆