在java中读unicode字符 [英] Reading unicode character in java

查看:183
本文介绍了在java中读unicode字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  String str =\\\ło我对Java有点新奇,当我分配一个unicode字符串\\\ży\\\ł\" ; 
System.out.println(str);

final StringBuilder stringBuilder = new StringBuilder();
InputStream inStream = new FileInputStream(C:/a.txt);
final InputStreamReader streamReader = new InputStreamReader(inStream,UTF-8);
final BufferedReader bufferedReader = new BufferedReader(streamReader);
String line =;
while((line = bufferedReader.readLine())!= null){
System.out.println(line);
stringBuilder.append(line);
}

为什么两种情况下的结果都不同,文件a.txt也包含相同的字符串但是当我打印文件的输出时,它打印 z\\\ło\\\ży\\\ł 而不是实际的unicode字符。任何想法,如果我想要将文件内容也打印成正在打印的字符串,我该怎么做。

解决方案

你的代码应该是正确的,但我猜,文件a.txt不包含使用UTF-8编码的Unicode字符,而是转义的字符串\\\ło\\\ży\\\ł。



请使用UTF-8感知编辑器检查文本文件是否正确,如Windows上最新版本的Notepad或Notepad ++。或者用你喜欢的十六进制编辑器编辑它 - 它不应该包含反斜杠。



我用€作为文件的UTF-8编码内容,它得到打印正确。请注意,根据您的终端编码(Windows上真的很麻烦)和字体,并不是所有Unicode字符都可以打印。


I'm a bit new to java, When I assign a unicode string to

  String str = "\u0142o\u017Cy\u0142";
  System.out.println(str);

  final StringBuilder stringBuilder = new StringBuilder();
  InputStream inStream = new FileInputStream("C:/a.txt");
  final InputStreamReader streamReader = new InputStreamReader(inStream, "UTF-8");
  final BufferedReader bufferedReader = new BufferedReader(streamReader);
  String line = "";
  while ((line = bufferedReader.readLine()) != null) {
      System.out.println(line);
      stringBuilder.append(line);
  }

Why are the results different in both cases the file a.txt also contains the same string. but when i print output of the file it prints z\u0142o\u017Cy\u0142 instead of the actual unicode characters. Any idea how do i do this if i want to file content also to be printed as string is being printed.

解决方案

Your code should be correct, but I guess that the file "a.txt" does not contain the Unicode characters encoded with UTF-8, but the escaped string "\u0142o\u017Cy\u0142".

Please check if the text file is correct, using an UTF-8 aware editor such as recent versions of Notepad or Notepad++ on Windows. Or edit it with your favorite hex editor - it should not contain backslashes.

I tried it with "€" as UTF-8-encoded content of the file and it gets printed correctly. Note that not all Unicode characters can be printed, depending on your terminal encoding (really a hassle on Windows) and font.

这篇关于在java中读unicode字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆