在java中读unicode字符 [英] Reading unicode character in java
问题描述
String str =\\\ło我对Java有点新奇,当我分配一个unicode字符串\\\ży\\\ł\" ;
System.out.println(str);
final StringBuilder stringBuilder = new StringBuilder();
InputStream inStream = new FileInputStream(C:/a.txt);
final InputStreamReader streamReader = new InputStreamReader(inStream,UTF-8);
final BufferedReader bufferedReader = new BufferedReader(streamReader);
String line =;
while((line = bufferedReader.readLine())!= null){
System.out.println(line);
stringBuilder.append(line);
}
为什么两种情况下的结果都不同,文件a.txt也包含相同的字符串但是当我打印文件的输出时,它打印 z\\\ło\\\ży\\\ł
而不是实际的unicode字符。任何想法,如果我想要将文件内容也打印成正在打印的字符串,我该怎么做。
你的代码应该是正确的,但我猜,文件a.txt不包含使用UTF-8编码的Unicode字符,而是转义的字符串\\\ło\\\ży\\\ł。
请使用UTF-8感知编辑器检查文本文件是否正确,如Windows上最新版本的Notepad或Notepad ++。或者用你喜欢的十六进制编辑器编辑它 - 它不应该包含反斜杠。
我用€作为文件的UTF-8编码内容,它得到打印正确。请注意,根据您的终端编码(Windows上真的很麻烦)和字体,并不是所有Unicode字符都可以打印。
I'm a bit new to java, When I assign a unicode string to
String str = "\u0142o\u017Cy\u0142";
System.out.println(str);
final StringBuilder stringBuilder = new StringBuilder();
InputStream inStream = new FileInputStream("C:/a.txt");
final InputStreamReader streamReader = new InputStreamReader(inStream, "UTF-8");
final BufferedReader bufferedReader = new BufferedReader(streamReader);
String line = "";
while ((line = bufferedReader.readLine()) != null) {
System.out.println(line);
stringBuilder.append(line);
}
Why are the results different in both cases the file a.txt also contains the same string. but when i print output of the file it prints z\u0142o\u017Cy\u0142
instead of the actual unicode characters. Any idea how do i do this if i want to file content also to be printed as string is being printed.
Your code should be correct, but I guess that the file "a.txt" does not contain the Unicode characters encoded with UTF-8, but the escaped string "\u0142o\u017Cy\u0142".
Please check if the text file is correct, using an UTF-8 aware editor such as recent versions of Notepad or Notepad++ on Windows. Or edit it with your favorite hex editor - it should not contain backslashes.
I tried it with "€" as UTF-8-encoded content of the file and it gets printed correctly. Note that not all Unicode characters can be printed, depending on your terminal encoding (really a hassle on Windows) and font.
这篇关于在java中读unicode字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!