在Java中读取奇怪的unicode字符? [英] Reading strange unicode character in Java?
问题描述
我有以下文本文件:
这个文件是用utf-8编码保存的。
我用下面的代码阅读文件的内容:
FileReader fr = new FileReader(f.txt);
BufferedReader br = new BufferedReader(fr);
String s1 = br.readLine();
String s2 = br.readLine();
System.out.println(s1 =+ s1.length());
System.out.println(s2 =+ s2.length());
输出:
s1 = 5
s2 = 4
然后我尝试使用 s1.charAt(0);
来获取s1的第一个字符,它是''
空白)字符。这就是为什么s1长度为5即使我试图使用 s1.trim();
它的长度仍然是5.
我不知道为什么会发生?如果文件是用ASCII编码保存的,那么它工作正常。
记事本显然保存了字节顺序标记,一个不可打印的字符,刚开始标记为UTF-8,但不是必需的而且确实不推荐)使用。您可以忽略或删除它;其他的文本编辑器通常会给你选择使用UTF-8或不使用BOM。
I have the following text file:
The file was saved with utf-8 encoding.
I used the following code to read the content of the file:
FileReader fr = new FileReader("f.txt");
BufferedReader br = new BufferedReader(fr);
String s1 = br.readLine();
String s2 = br.readLine();
System.out.println("s1 = " + s1.length());
System.out.println("s2 = " + s2.length());
the output:
s1 = 5
s2 = 4
Then I tried to use s1.charAt(0);
to get the first character of s1 and it was ''
(blank) character. That's why s1 has the length of 5. Even if I tried to use s1.trim();
its length still 5.
I dont know why that happened? It worked correctly if the file was saved with ASCII encoding.
Notepad apparently saved the file with a byte order mark, a nonprintable character at the beginning that just marks it as UTF-8 but is not required (and indeed not recommended) to use. You can ignore or remove it; other text editors often give you the choice of using UTF-8 with or without a BOM.
这篇关于在Java中读取奇怪的unicode字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!