如何在Java中编写3个字节的unicode文字? [英] How to write 3 bytes unicode literal in Java?
问题描述
我想用Java编写unicode文字U + 10428. http://www.marathon-studios.com/unicode/U10428/Deseret_Small_Letter_Long_I
I'd like to write unicode literal U+10428 in Java. http://www.marathon-studios.com/unicode/U10428/Deseret_Small_Letter_Long_I
我尝试使用'\ u10428',但无法编译.
I tried with '\u10428' and it doesn't compile.
推荐答案
由于当人们认为64K足以满足所有人的要求时,Java全面采用了unicode(以前有人听说过吗?),所以他们开始使用UCS-2和后来升级到UTF-16.
Because Java went full-out unicode when people thought 64K are enough for everyone (Where did one hear such before?), they started out with UCS-2 and later upgraded to UTF-16.
但是他们从不费力为BMP之外的Unicode字符添加转义序列.
But they never bothered to add an escape sequence for unicode characters outside the BMP.
因此,您唯一的求助是手动重新编码为UTF-16代理对并使用两个UTF-16转义符.
Thus, your only recourse is manually recoding to a UTF-16 surrogate-pair and using two UTF-16 escapes.
您的示例代码点U+10428
是"\uD801\uDC28"
.
我使用此网站进行重新编码: http://rishida.net/tools/conversion/
I used this site for the recoding: http://rishida.net/tools/conversion/
3.10.5字符串文字
字符串文字由零个或多个用双引号引起来的字符组成. 字符可以用转义序列表示(第3.10.6节)-一次转义 U + 0000到U + FFFF范围内的字符的序列,两个转义序列 适用于U + 010000到U范围内的字符的UTF-16代理代码单位 + 10FFFF.
3.10.5 String Literals
A string literal consists of zero or more characters enclosed in double quotes. Characters may be represented by escape sequences (§3.10.6) - one escape sequence for characters in the range U+0000 to U+FFFF, two escape sequences for the UTF-16 surrogate code units of characters in the range U+010000 to U +10FFFF.
这篇关于如何在Java中编写3个字节的unicode文字?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!