Java 字符串编码 (UTF-8) [英] Java String encoding (UTF-8)
问题描述
我遇到了这一行遗留代码,我正在尝试弄清楚:
I have come across this line of legacy code, which I am trying to figure out:
String newString = new String(oldString.getBytes("UTF-8"), "UTF-8"));
据我所知,它是编码&使用相同的字符集解码.
As far as I can understand, it is encoding & decoding using the same charSet.
这与以下有何不同?
String newString = oldString;
是否存在两条线输出不同的场景?
ps:只是澄清一下,是的,我知道 Joel Spolsky 关于编码的优秀文章 !
p.s.: Just to clarify, yes I am aware of the excellent article on encoding by Joel Spolsky !
推荐答案
这可能很复杂
String newString = new String(oldString);
这缩短了字符串,因为使用的底层 char[] 更长.
This shortens the String is the underlying char[] used is much longer.
但更具体地说,它将检查每个字符是否可以进行 UTF-8 编码.
However more specifically it will be checking that every character can be UTF-8 encoded.
在字符串中可以有一些无法编码的字符",这些字符会被转换为 ?
There are some "characters" you can have in a String which cannot be encoded and these would be turned into ?
uD800 和 uDFFF 之间的任何字符都不能被编码,将被转为 '?'
Any character between uD800 and uDFFF cannot be encoded and will be turned into '?'
String oldString = "uD800";
String newString = new String(oldString.getBytes("UTF-8"), "UTF-8");
System.out.println(newString.equals(oldString));
印刷品
false
这篇关于Java 字符串编码 (UTF-8)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!