使用OpenCSV解析包含unicode字符的CSV文件 [英] parse CSV files that contain unicode character using OpenCSV

查看:815
本文介绍了使用OpenCSV解析包含unicode字符的CSV文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在netbeans 6.0.1中使用 OpenCSV 解析.csv文件。我的文件包含一些unicode字符,当我在输出中写出的字符以其他形式出现(像(HJ1'-E /;)
当我在记事本打开这个文件,它是确定。



我使用的代码:

  CSVReader reader = new CSVReader(new FileReader d:\\a.csv),',','\'',1); 
String [] line;
while((line = reader.readNext())! = null){
StringBuilder stb = new StringBuilder(400);
for(int i = 0; i stb.append(line [i] ;
stb.append(;);
}
System.out.println(stb);
}
pre>

非常感谢。

解决方案

要知道你的文件是什么编码,例如UTF-8或UTF-16。生成这个文件开始是什么?



之后,需要创建一个 FileInputStream 包装在 InputStreamReader 而不只是一个 FileReader 。 ( FileReader 始终使用系统的默认编码。)指定创建 InputStreamReader 时要使用的编码如果你选择了正确的一个,一切都应该开始工作了。



注意,你不需要使用OpenCSV来检查这个 - 你可以只读文本文件自己和打印它。我不确定我会相信 System.out 能够处理非ASCII字符 - 你可能想找到一种不同的方式来检查字符串,如将字符的各个值作为整数打印(最好是十六进制),然后将它们与unicode.org上的图表进行比较 。另一方面,你可以尝试正确的编码,看看发生了什么开始...



编辑:好的,所以如果你使用UTF-8:

  CSVReader reader = new CSVReader(
new InputStreamReader(new FileInputStream(d:\\a.csv ),UTF-8),
',','\'',1);
String [] line;
while((line = reader.readNext())!= null){
StringBuilder stb = new StringBuilder(400);
for(int i = 0; i stb.append(line [i]);
stb.append(;);
}
System.out.println(stb);
}



我希望你有一个try / finally块来关闭你的文件实际代码。)


i parse a .csv file with OpenCSV in netbeans 6.0.1. my file contains some unicode character and when i write it in output the character appears in other form(like (HJ1'-E/;) when i open this file in notepad, it is ok.

the code that i used:

CSVReader reader=new CSVReader(new FileReader("d:\\a.csv"),',','\'',1);
    String[] line;
    while((line=reader.readNext())!=null){
        StringBuilder stb=new StringBuilder(400);
        for(int i=0;i<line.length;i++){
            stb.append(line[i]);
            stb.append(";");
        }
        System.out.println( stb);
    }

Thanks very much in advance.

解决方案

First you need to know what encoding your file is in, such as UTF-8 or UTF-16. What's generating this file to start with?

After that, it's relatively straightforward - you need to create a FileInputStream wrapped in an InputStreamReader instead of just a FileReader. (FileReader always uses the default encoding for the system.) Specify the encoding to use when you create the InputStreamReader, and if you've picked the right one, everything should start working.

Note that you don't need to use OpenCSV to check this - you could just read the text of the file yourself and print it all out. I'm not sure I'd trust System.out to be able to handle non-ASCII characters though - you may want to find a different way of examining strings, such as printing out the individual values of characters as integers (preferably in hex) and then comparing them with the charts at unicode.org. On the other hand, you could try the right encoding and see what happens to start with...

EDIT: Okay, so if you're using UTF-8:

CSVReader reader=new CSVReader(
    new InputStreamReader(new FileInputStream("d:\\a.csv"), "UTF-8"), 
    ',', '\'', 1);
String[] line;
while ((line = reader.readNext()) != null) {
    StringBuilder stb = new StringBuilder(400);
    for (int i = 0; i < line.length; i++) {
         stb.append(line[i]);
         stb.append(";");
    }
    System.out.println(stb);
}

(I hope you have a try/finally block to close the file in your real code.)

这篇关于使用OpenCSV解析包含unicode字符的CSV文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆