Java Unicode 编码 [英] Java Unicode encoding

查看:22
本文介绍了Java Unicode 编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Java char2字节(最大大小为 65,536)但有 95,221 个 Unicode 字符.这是否意味着您无法在 Java 应用程序中处理某些 Unicode 字符?

A Java char is 2 bytes (max size of 65,536) but there are 95,221 Unicode characters. Does this mean that you can't handle certain Unicode characters in a Java application?

这是否归结为您使用的字符编码?

Does this boil down to what character encoding you are using?

推荐答案

如果你足够小心,你可以处理所有这些.

You can handle them all if you're careful enough.

Java 的 char 是一个 UTF-16 代码单元.对于代码点 > 0xFFFF 的字符,它将使用 2 个 chars(代理对)进行编码.

Java's char is a UTF-16 code unit. For characters with code-point > 0xFFFF it will be encoded with 2 chars (a surrogate pair).

请参阅 http://www.oracle.com/us/technologies/java/supplementary-142654.html 了解如何在 Java 中处理这些字符.

See http://www.oracle.com/us/technologies/java/supplementary-142654.html for how to handle those characters in Java.

(顺便说一句,在 Unicode 5.2 中,1,114,112 个插槽中有 107,154 个已分配字符.)

(BTW, in Unicode 5.2 there are 107,154 assigned characters out of 1,114,112 slots.)

这篇关于Java Unicode 编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆