如何获得 Unicode 字符的代码? [英] How can I get a Unicode character's code?

查看:18
本文介绍了如何获得 Unicode 字符的代码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有这个:

char registered = '®';

umlaut,或任何 unicode 字符.我怎么能得到它的代码?

or an umlaut, or whatever unicode character. How could I get its code?

推荐答案

只需将其转换为int:

char registered = '®';
int code = (int) registered;

实际上有一个从 charint 的隐式转换,所以你不必像我上面所做的那样明确指定它,但我会在这个案例可以让你清楚地知道你想要做什么.

In fact there's an implicit conversion from char to int so you don't have to specify it explicitly as I've done above, but I would do so in this case to make it obvious what you're trying to do.

这将给出 UTF-16 代码单元 - 它与基本多语言平面中定义的任何字符的 Unicode 代码点相同.(并且只有 BMP 字符可以表示为 Java 中的 char 值.)正如 Andrzej Doyle 的回答所说,如果您想要来自任意字符串的 Unicode 代码点,请使用 Character.codePointAt().

This will give the UTF-16 code unit - which is the same as the Unicode code point for any character defined in the Basic Multilingual Plane. (And only BMP characters can be represented as char values in Java.) As Andrzej Doyle's answer says, if you want the Unicode code point from an arbitrary string, use Character.codePointAt().

一旦您获得了 UTF-16 代码单元或 Unicode 代码点(这两者都是整数),就由您决定如何处理它们.如果你想要一个字符串表示,你需要确切地决定你想要什么种类的表示.(例如,如果您知道该值将始终在 BMP 中,您可能需要以 U+ 为前缀的固定 4 位十六进制表示,例如 U+0020" 为空间.)但这超出了这个问题的范围,因为我们不知道要求是什么.

Once you've got the UTF-16 code unit or Unicode code points, both of which are integers, it's up to you what you do with them. If you want a string representation, you need to decide exactly what kind of representation you want. (For example, if you know the value will always be in the BMP, you might want a fixed 4-digit hex representation prefixed with U+, e.g. "U+0020" for space.) That's beyond the scope of this question though, as we don't know what the requirements are.

这篇关于如何获得 Unicode 字符的代码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆