Java:为什么“\\\”在UTF-8中转换为[-17,-65,-65]? [英] Java: why "\uFFFF" converts to [-17, -65, -65] in UTF-8?
问题描述
为什么\\\
(显然为2个字节长)转换为UTF-8中的[-17,-65,-65] -1,-1]?
System.out.println(Arrays.toString(\\\.getBytes(StandardCharsets .UTF_8)));这是因为对于大于127的码点,UTF-8在每个字节中只使用6位? p>
解决方案 0xFFFF
有一个位模式 11111111 11111111
。根据UTF-8规则划分位,模式变为 1111 111111 111111
。现在添加UTF-8的前缀位,并且模式变为 0xEF 0xBF 0xBF 0>。 * 1110 * 1111 * 10 * 111111 * 10 * 111111
code>,aka 239 191 191
,aka -17 -65 -65
是Java用于有符号值的值 - Java没有无符号数据类型)。
Why does "\uFFFF"
(which is apparently 2 bytes long) convert to [-17,-65,-65] in UTF-8 and not [-1,-1]?
System.out.println(Arrays.toString("\uFFFF".getBytes(StandardCharsets.UTF_8)));
Is this because UTF-8 uses only 6 bits in every byte for codepoints larger than 127?
解决方案 0xFFFF
has a bit pattern of 11111111 11111111
. Divide up the bits according to UTF-8 rules and the pattern becomes 1111 111111 111111
. Now add UTF-8's prefix bits and the pattern becomes *1110*1111 *10*111111 *10*111111
, which is 0xEF 0xBF 0xBF
, aka 239 191 191
, aka -17 -65 -65
in twos complement format (which is what Java uses for signed values - Java does not have unsigned data types).
这篇关于Java:为什么“\\\”在UTF-8中转换为[-17,-65,-65]?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!