读取文件和写入文件,其中包含UTF - 8(不同语言)的字符 [英] Read file and write file which has characters in UTF - 8 (different language)
问题描述
我有一个文件,其中包含以下字符:Joh 1:1ஆதியிலேஆதியிலேவாரதவாரதைதைதைஇருநஇருநஇருநஇருநததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுதது。
I have a file which has characters like: " Joh 1:1 ஆதியிலே வார்த்தை இருந்தது, அந்த வார்த்தை தேவனிடத்திலிருந்தது, அந்த வார்த்தை தேவனாயிருந்தது. "
www.unicode.org/charts/PDF/U0B80.pdf
当我使用以下代码时:
bufferedWriter = new BufferedWriter (new OutputStreamWriter(System.out, "UTF8"));
输出是方框和其他奇怪的字符:
The output is boxes and other weird characters like this:
P ^ O֛ ; < aYՠ;
"�P�^����O֛���;�<�aYՠ؛"
任何人都可以帮忙吗?
这些是完整的代码:
File f=new File("E:\\bible.docx");
Reader decoded=new InputStreamReader(new FileInputStream(f), StandardCharsets.UTF_8);
bufferedWriter = new BufferedWriter (new OutputStreamWriter(System.out, StandardCharsets.UTF_8));
char[] buffer = new char[1024];
int n;
StringBuilder build=new StringBuilder();
while(true){
n=decoded.read(buffer);
if(n<0){break;}
build.append(buffer,0,n);
bufferedWriter.write(buffer);
}
StringBuilder值显示UTF字符,但在窗口中显示时,它显示为方框..
The StringBuilder value shows the UTF characters but when displaying it in the window it shows as boxes..
找到问题的答案!!!
编码是正确的(即UTF-8)Java将文件读取为UTF-8,字符串字符为UTF-8,问题是在netbeans的输出面板中没有字体显示它。更改输出面板的字体(Netbeans-> tools-> options-> misc-> output选项卡)后,我得到了预期的结果。当它在JTextArea中显示时(需要更改字体),同样适用。但是我们无法更改windows'cmd提示字体。
Found the Answer to the problem!!! The Encoding is Correct (i.e UTF-8) Java reads the file as UTF-8 and the String characters are UTF-8, The problem is that there is no font to display it in netbeans' output panel. After changing the font for the output panel (Netbeans->tools->options->misc->output tab) I got the expected result. The same applies when it is displayed in JTextArea(font needs to be changed). But we can't change font the windows' cmd prompt.
推荐答案
因为你的输出是以UTF-8编码,但仍包含替换字符( U + FFFD
,&#xFFFD;),我相信当您读取数据时会出现问题。
Because your output is encoded in UTF-8, but still contains the replacement character (U+FFFD
, �), I believe the problem occurs when you read the data.
确保您知道输入流使用的编码,并设置 InputStreamReader
的编码。如果那是泰米尔语,我猜它可能是UTF-8。我不知道Java是否支持TACE-16。它看起来像这样…
Make sure that you know what encoding your input stream uses, and set the encoding for the InputStreamReader
according. If that's Tamil, I would guess it's probably in UTF-8. I don't know if Java supports TACE-16. It would look something like this…
StringBuilder buffer = new StringBuilder();
try (InputStream encoded = ...) {
Reader decoded = new InputStreamReader(encoded, StandardCharsets.UTF_8);
char[] buffer = new char[1024];
while (true) {
int n = decoded.read(buffer);
if (n < 0)
break;
buffer.append(buffer, 0, n);
}
}
String verse = buffer.toString();
这篇关于读取文件和写入文件,其中包含UTF - 8(不同语言)的字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!