读取文件和写入文件,其中包含UTF - 8(不同语言)的字符 [英] Read file and write file which has characters in UTF - 8 (different language)

查看:153
本文介绍了读取文件和写入文件,其中包含UTF - 8(不同语言)的字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文件,其中包含以下字符:Joh 1:1ஆதியிலேஆதியிலேவாரதவாரதைதைதைஇருநஇருநஇருநஇருநததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுதது。

I have a file which has characters like: " Joh 1:1 ஆதியிலே வார்த்தை இருந்தது, அந்த வார்த்தை தேவனிடத்திலிருந்தது, அந்த வார்த்தை தேவனாயிருந்தது. "

www.unicode.org/charts/PDF/U0B80.pdf

当我使用以下代码时:

bufferedWriter = new BufferedWriter (new OutputStreamWriter(System.out, "UTF8"));

输出是方框和其他奇怪的字符:

The output is boxes and other weird characters like this:

P ^ O֛ ; < aYՠ;

"�P�^����O֛���;�<�aYՠ؛"

任何人都可以帮忙吗?

这些是完整的代码:

File f=new File("E:\\bible.docx");
        Reader decoded=new InputStreamReader(new FileInputStream(f), StandardCharsets.UTF_8);
        bufferedWriter = new BufferedWriter (new OutputStreamWriter(System.out, StandardCharsets.UTF_8));
        char[] buffer = new char[1024];
        int n;
        StringBuilder build=new StringBuilder();
        while(true){
            n=decoded.read(buffer);
            if(n<0){break;}
            build.append(buffer,0,n);
            bufferedWriter.write(buffer);
        }

StringBuilder值显示UTF字符,但在窗口中显示时,它显示为方框..

The StringBuilder value shows the UTF characters but when displaying it in the window it shows as boxes..

找到问题的答案!!!
编码是正确的(即UTF-8)Java将文件读取为UTF-8,字符串字符为UTF-8,问题是在netbeans的输出面板中没有字体显示它。更改输出面板的字体(Netbeans-> tools-> options-> misc-> output选项卡)后,我得到了预期的结果。当它在JTextArea中显示时(需要更改字体),同样适用。但是我们无法更改windows'cmd提示字体。

Found the Answer to the problem!!! The Encoding is Correct (i.e UTF-8) Java reads the file as UTF-8 and the String characters are UTF-8, The problem is that there is no font to display it in netbeans' output panel. After changing the font for the output panel (Netbeans->tools->options->misc->output tab) I got the expected result. The same applies when it is displayed in JTextArea(font needs to be changed). But we can't change font the windows' cmd prompt.

推荐答案

因为你的输出是以UTF-8编码,但仍包含替换字符( U + FFFD ,&#xFFFD;),我相信当您读取数据时会出现问题。

Because your output is encoded in UTF-8, but still contains the replacement character (U+FFFD, �), I believe the problem occurs when you read the data.

确保您知道输入流使用的编码,并设置 InputStreamReader 的编码。如果那是泰米尔语,我猜它可能是UTF-8。我不知道Java是否支持TACE-16。它看起来像这样…

Make sure that you know what encoding your input stream uses, and set the encoding for the InputStreamReader according. If that's Tamil, I would guess it's probably in UTF-8. I don't know if Java supports TACE-16. It would look something like this…

StringBuilder buffer = new StringBuilder();
try (InputStream encoded = ...) {
  Reader decoded = new InputStreamReader(encoded, StandardCharsets.UTF_8);
  char[] buffer = new char[1024];
  while (true) {
    int n = decoded.read(buffer);
    if (n < 0)
      break;
    buffer.append(buffer, 0, n);
  }
}
String verse = buffer.toString();

这篇关于读取文件和写入文件,其中包含UTF - 8(不同语言)的字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆