将字符与代码点进行比较? [英] Comparing a char to a code-point?
问题描述
将代码点与 Java 字符进行比较的正确"方法是什么?例如:
What is the "correct" way of comparing a code-point to a Java character? For example:
int codepoint = String.codePointAt(0);
char token = '
';
我知道我可以做到:
if (codepoint==(int) token)
{ ... }
但是这段代码看起来很脆弱.是否有正式的 API 方法用于将 codepoints
与 chars
进行比较,或将 char
转换为 codepoint
进行比较?
but this code looks fragile. Is there a formal API method for comparing codepoints
to chars
, or converting the char
up to a codepoint
for comparison?
推荐答案
一点背景:1995 年 Java 出现时,char
类型是基于原来的Unicode 88" 规范,限制为 16 位.一年后,Unicode 2.0 实现时,引入了代理字符的概念,以超越 16 位的限制.
A little bit of background: When Java appeared in 1995, the char
type was based on the original "Unicode 88" specification, which was limited to 16 bits. A year later, when Unicode 2.0 was implemented, the concept of surrogate characters was introduced to go beyond the 16 bit limit.
Java 在内部以 UTF-16 格式表示所有 String
.对于超过 U+FFFF 的代码点,代码点由代理对表示,即两个 char
s,第一个是高代理代码单元,(在 uD800-uDBFF 范围内),第二个是低代理代码单元(在 uDC00-uDFFF 范围内).
Java internally represents all String
s in UTF-16 format. For code points exceeding U+FFFF the code point is represented by a surrogate pair, i.e., two char
s with the first being the high-surrogates code unit, (in the range uD800-uDBFF), the second being the low-surrogate code unit (in the range uDC00-uDFFF).
从早期开始,所有基本的 Character
方法都基于这样一个假设:一个代码点可以用一个 char
表示,所以这就是方法签名的样子.我想保留在 Unicode 2.0 出现时没有改变的向后兼容性,并且在处理它们时需要谨慎.引用 Java 文档:
From the early days, all basic Character
methods were based on the assumption that a code point could be represented in one char
, so that's what the method signatures look like. I guess to preserve backward compatibility that was not changed when Unicode 2.0 came around and caution is needed when dealing with them. To quote from the Java documentation:
- 仅接受 char 值的方法不支持补充字符.他们将代理范围中的 char 值视为未定义的字符.例如,Character.isLetter('uD840') 返回 false,即使此特定值后跟字符串中的任何低代理值将表示一个字母.
- 接受 int 值的方法支持所有 Unicode 字符,包括增补字符.例如,Character.isLetter(0x2F81A) 返回 true,因为代码点值表示一个字母(CJK 表意文字).
将 char
转换为 int
,就像您在示例中所做的那样,虽然效果很好.
Casting the char
to an int
, as you do in your sample, works fine though.
这篇关于将字符与代码点进行比较?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!