在Java中解析汉字显示奇怪的行为 [英] parsing chinese characters in java showing weird behaviour

查看:179
本文介绍了在Java中解析汉字显示奇怪的行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个csv文件,其中包含带有中文字符串的某些字段。
不幸的是,我不知道此输入csv文件的编码是什么。
我试图读取此输入的csv,并使用其中的选择性字段,我正在制作一个html和另一个csv文件作为输出。

I am having a csv file which has some fields having chinese character strings. Unfortunately i dont know what is encoding of this input csv file. I am trying to read this input csv and using selective fields from it, i am making a html and another csv file as output.

同时读取csv输入,我尝试了 http://列表中的所有编码docs.oracle.com/javase/7/docs/technotes/guides/intl/encoding.doc.html 在说明中提到了中文
发现我是否使用

While reading csv input, i tried all encoding from list http://docs.oracle.com/javase/7/docs/technotes/guides/intl/encoding.doc.html which have Chinese mentioned in their description. And found if I use

InputStreamReader read = new InputStreamReader(filepath,"GB18030");

用于读取csv

for reading csv and

OutputStreamWriter osW=new OutputStreamWriter(objBufferedOutputStream,"UTF-16");

要编写HTML和CSV,我的输出不会显示奇怪的字符。

For writing html and csv, my output doesnt show weird characters.

但是,有两个问题:


  1. 输出显示的字符串与输入完全不同!
    我的意思是,即使我未对代码中的任何字符串进行任何处理,也无法在输入csv的任何字段中找到输出。

例如,我的输入有一个中文字符字符串:字段号8上的陈真珍。
但我的输出html有类似:闄堢湡鐝 ,它对应于输入字段号8。 / p>

For example, my input has a chinese char string: 陈真珍 on field number 8. but my output html has something like: 闄堢湡鐝� which corresponds to input field number 8.


  1. 如您所见,有一个问号,即输出闄堢湡鐝中unicode的替换字符。

我请您帮助我找出此处可能存在错误的地方...

I request you to kindly help me trace where can be a mistake here...

PS: Aiso,我检查了Google翻译,发现输入字符串陈真珍表示某些Chen Zhen Zhen

PS: Aiso, I checked Google translation and found,input string 陈真珍 means some Chen Zhen Zhen

,其对应的输出字符串闄堢湡鐝 表示名为Yaobaoyujue $ b的东西$ b因此含义和字符表示也有所不同。

and its corresponding output string 闄堢湡鐝� means something called as Yaobaoyujue So there is difference in meaning as well as representation of characters also.

推荐答案

该输出表示您的输入不在GB18030编码。

That output means that your input is NOT in GB18030 encoding.

还:请检查并仔细检查您查看您的文件:程序使用哪种编码来打开文件,特别是输入文件。通常,文本文件(和CSV文件)都没有附带显示其编码的元数据,因此编辑者必须进行猜测,并且猜测很容易是错误的。

Also: please check and double-check how you view your files: what encoding does the program use that opens the files, specifically the input file. Usually text files (and CSV files) don't come with metadata attached to them that shows their encoding, so the editors have to guess and that guess can easily be wrong.

这篇关于在Java中解析汉字显示奇怪的行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆