如何在Java中编写3个字节的unicode文字? [英] How to write 3 bytes unicode literal in Java?

查看:254
本文介绍了如何在Java中编写3个字节的unicode文字?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想用Java编写unicode文字U + 10428. http://www.marathon-studios.com/unicode/U10428/Deseret_Small_Letter_Long_I

I'd like to write unicode literal U+10428 in Java. http://www.marathon-studios.com/unicode/U10428/Deseret_Small_Letter_Long_I

我尝试使用'\ u10428',但无法编译.

I tried with '\u10428' and it doesn't compile.

推荐答案

由于当人们认为64K足以满足所有人的要求时,Java全面采用了unicode(以前有人听说过吗?),所以他们开始使用UCS-2和后来升级到UTF-16.

Because Java went full-out unicode when people thought 64K are enough for everyone (Where did one hear such before?), they started out with UCS-2 and later upgraded to UTF-16.

但是他们从不费力为BMP之外的Unicode字符添加转义序列.

But they never bothered to add an escape sequence for unicode characters outside the BMP.

因此,您唯一的求助是手动重新编码为UTF-16代理对并使用两个UTF-16转义符.

Thus, your only recourse is manually recoding to a UTF-16 surrogate-pair and using two UTF-16 escapes.

您的示例代码点U+10428"\uD801\uDC28".

我使用此网站进行重新编码: http://rishida.net/tools/conversion/

I used this site for the recoding: http://rishida.net/tools/conversion/

从文档中引用:

3.10.5字符串文字

字符串文字由零个或多个用双引号引起来的字符组成. 字符可以用转义序列表示(第3.10.6节)-一次转义 U + 0000到U + FFFF范围内的字符的序列,两个转义序列 适用于U + 010000到U范围内的字符的UTF-16代理代码单位 + 10FFFF.

3.10.5 String Literals

A string literal consists of zero or more characters enclosed in double quotes. Characters may be represented by escape sequences (§3.10.6) - one escape sequence for characters in the range U+0000 to U+FFFF, two escape sequences for the UTF-16 surrogate code units of characters in the range U+010000 to U +10FFFF.

这篇关于如何在Java中编写3个字节的unicode文字?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆