读取文件和写入文件，其中包含UTF - 8（不同语言）的字符 [英] Read file and write file which has characters in UTF - 8 (different language)

查看：153 发布时间：2019/1/2 14:30:38 java java-io

本文介绍了读取文件和写入文件，其中包含UTF - 8（不同语言）的字符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个文件，其中包含以下字符：Joh 1：1ஆதியிலேஆதியிலேவாரதவாரதைதைதைஇருநஇருநஇருநஇருநததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுதது。

I have a file which has characters like: " Joh 1:1 ஆதியிலே வார்த்தை இருந்தது, அந்த வார்த்தை தேவனிடத்திலிருந்தது, அந்த வார்த்தை தேவனாயிருந்தது. "

www.unicode.org/charts/PDF/U0B80.pdf

当我使用以下代码时：

bufferedWriter = new BufferedWriter (new OutputStreamWriter(System.out, "UTF8"));

输出是方框和其他奇怪的字符：

The output is boxes and other weird characters like this:

P ^ O֛ ; < aYՠ;

"�P�^��O֛��;�<�aYՠ؛"

任何人都可以帮忙吗？

这些是完整的代码：

File f=new File("E:\\bible.docx");
        Reader decoded=new InputStreamReader(new FileInputStream(f), StandardCharsets.UTF_8);
        bufferedWriter = new BufferedWriter (new OutputStreamWriter(System.out, StandardCharsets.UTF_8));
        char[] buffer = new char[1024];
        int n;
        StringBuilder build=new StringBuilder();
        while(true){
            n=decoded.read(buffer);
            if(n<0){break;}
            build.append(buffer,0,n);
            bufferedWriter.write(buffer);
        }

StringBuilder值显示UTF字符，但在窗口中显示时，它显示为方框..

The StringBuilder value shows the UTF characters but when displaying it in the window it shows as boxes..

找到问题的答案!!!
编码是正确的（即UTF-8）Java将文件读取为UTF-8，字符串字符为UTF-8，问题是在netbeans的输出面板中没有字体显示它。更改输出面板的字体（Netbeans-> tools-> options-> misc-> output选项卡）后，我得到了预期的结果。当它在JTextArea中显示时（需要更改字体），同样适用。但是我们无法更改windows'cmd提示字体。

Found the Answer to the problem!!! The Encoding is Correct (i.e UTF-8) Java reads the file as UTF-8 and the String characters are UTF-8, The problem is that there is no font to display it in netbeans' output panel. After changing the font for the output panel (Netbeans->tools->options->misc->output tab) I got the expected result. The same applies when it is displayed in JTextArea(font needs to be changed). But we can't change font the windows' cmd prompt.

推荐答案

因为你的输出是以UTF-8编码，但仍包含替换字符（ U + FFFD ，&＃xFFFD;），我相信当您读取数据时会出现问题。

Because your output is encoded in UTF-8, but still contains the replacement character (U+FFFD, �), I believe the problem occurs when you read the data.

确保您知道输入流使用的编码，并设置 InputStreamReader 的编码。如果那是泰米尔语，我猜它可能是UTF-8。我不知道Java是否支持TACE-16。它看起来像这样…

Make sure that you know what encoding your input stream uses, and set the encoding for the InputStreamReader according. If that's Tamil, I would guess it's probably in UTF-8. I don't know if Java supports TACE-16. It would look something like this…

StringBuilder buffer = new StringBuilder();
try (InputStream encoded = ...) {
  Reader decoded = new InputStreamReader(encoded, StandardCharsets.UTF_8);
  char[] buffer = new char[1024];
  while (true) {
    int n = decoded.read(buffer);
    if (n < 0)
      break;
    buffer.append(buffer, 0, n);
  }
}
String verse = buffer.toString();

这篇关于读取文件和写入文件，其中包含UTF - 8（不同语言）的字符的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

读取文件和写入文件，其中包含UTF - 8（不同语言）的字符 [英] Read file and write file which has characters in UTF - 8 (different language)

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

读取文件和写入文件，其中包含UTF - 8（不同语言）的字符 [英] Read file and write file which has characters in UTF - 8 (different language)

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭