将char与代码点进行比较? [英] Comparing a char to a code-point?
问题描述
将代码点与Java角色进行比较的正确方法是什么?例如:
What is the "correct" way of comparing a code-point to a Java character? For example:
int codepoint = String.codePointAt(0);
char token = '\n';
我知道我可能会这样做:
I know I can probably do:
if (codepoint==(int) token)
{ ... }
但这段代码看起来很脆弱。是否有正式的API方法将代码点
与字符
进行比较,或转换 char
最多代码点
进行比较?
but this code looks fragile. Is there a formal API method for comparing codepoints
to chars
, or converting the char
up to a codepoint
for comparison?
推荐答案
一点背景:当Java出现在1995年时, char
类型基于原始的 Unicode 88 规范,限制为16位。一年后,当实现Unicode 2.0时,引入了代理字符的概念超出了16位的限制。
A little bit of background: When Java appeared in 1995, the char
type was based on the original "Unicode 88" specification, which was limited to 16 bits. A year later, when Unicode 2.0 was implemented, the concept of surrogate characters was introduced to go beyond the 16 bit limit.
Java内部代表所有字符串
以UTF-16格式。对于超过U + FFFF的代码点,代码点由代理对表示,即两个 char
s,第一个是高代理代码单元,(在范围内) \\\�-\ uDBFF),第二个是低代理代码单元(在\ uDC00-\\\ 00DFFF范围内)。
Java internally represents all String
s in UTF-16 format. For code points exceeding U+FFFF the code point is represented by a surrogate pair, i.e., two char
s with the first being the high-surrogates code unit, (in the range \uD800-\uDBFF), the second being the low-surrogate code unit (in the range \uDC00-\uDFFF).
从早期开始,所有基本的字符
方法都是基于一个代码点可以在一个 char
中表示的假设,这就是方法签名看起来像。我想保留当Unicode 2.0出现时没有改变的向后兼容性,并且在处理它们时需要谨慎。引用 Java文档:
From the early days, all basic Character
methods were based on the assumption that a code point could be represented in one char
, so that's what the method signatures look like. I guess to preserve backward compatibility that was not changed when Unicode 2.0 came around and caution is needed when dealing with them. To quote from the Java documentation:
- 只接受char值的方法不支持增补字符。它们将代理范围中的char值视为未定义的字符。例如,Character.isLetter('\ uD840')返回false,即使后面跟着字符串中任何低代理值的特定值也代表一个字母。
- 方法接受int值支持所有Unicode字符,包括补充字符。例如,Character.isLetter(0x2F81A)返回true,因为代码点值表示一个字母(一个CJK表意文字)。
Casting the char
到 int
,就像在样本中一样,但工作正常。
Casting the char
to an int
, as you do in your sample, works fine though.
这篇关于将char与代码点进行比较?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!