在Java中读取奇怪的unicode字符? [英] Reading strange unicode character in Java?

查看:178
本文介绍了在Java中读取奇怪的unicode字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下文本文件:



这个文件是用utf-8编码保存的。

我用下面的代码阅读文件的内容:

  FileReader fr = new FileReader(f.txt); 
BufferedReader br = new BufferedReader(fr);
String s1 = br.readLine();
String s2 = br.readLine();
System.out.println(s1 =+ s1.length());
System.out.println(s2 =+ s2.length());

输出:

  s1 = 5 

s2 = 4

然后我尝试使用 s1.charAt(0); 来获取s1的第一个字符,它是''空白)字符。这就是为什么s1长度为5即使我试图使用 s1.trim(); 它的长度仍然是5.
我不知道为什么会发生?如果文件是用ASCII编码保存的,那么它工作正常。

解决方案

记事本显然保存了字节顺序标记,一个不可打印的字符,刚开始标记为UTF-8,但不是必需的而且确实不推荐)使用。您可以忽略或删除它;其他的文本编辑器通常会给你选择使用UTF-8或不使用BOM。


I have the following text file:

The file was saved with utf-8 encoding.

I used the following code to read the content of the file:

FileReader fr = new FileReader("f.txt");
BufferedReader br = new BufferedReader(fr);
String s1 = br.readLine();
String s2 = br.readLine();
System.out.println("s1 = " + s1.length());
System.out.println("s2 = " + s2.length());

the output:

s1 = 5

s2 = 4

Then I tried to use s1.charAt(0); to get the first character of s1 and it was '' (blank) character. That's why s1 has the length of 5. Even if I tried to use s1.trim(); its length still 5. I dont know why that happened? It worked correctly if the file was saved with ASCII encoding.

解决方案

Notepad apparently saved the file with a byte order mark, a nonprintable character at the beginning that just marks it as UTF-8 but is not required (and indeed not recommended) to use. You can ignore or remove it; other text editors often give you the choice of using UTF-8 with or without a BOM.

这篇关于在Java中读取奇怪的unicode字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆