将字符与代码点进行比较? [英] Comparing a char to a code-point?

查看:34
本文介绍了将字符与代码点进行比较?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

将代码点与 Java 字符进行比较的正确"方法是什么?例如:

What is the "correct" way of comparing a code-point to a Java character? For example:

int codepoint = String.codePointAt(0);
char token = '
';

我知道我可以做到:

if (codepoint==(int) token)
{ ... }

但是这段代码看起来很脆弱.是否有正式的 API 方法用于将 codepointschars 进行比较,或将 char 转换为 codepoint 进行比较?

but this code looks fragile. Is there a formal API method for comparing codepoints to chars, or converting the char up to a codepoint for comparison?

推荐答案

一点背景:1995 年 Java 出现时,char 类型是基于原来的Unicode 88" 规范,限制为 16 位.一年后,Unicode 2.0 实现时,引入了代理字符的概念,以超越 16 位的限制.

A little bit of background: When Java appeared in 1995, the char type was based on the original "Unicode 88" specification, which was limited to 16 bits. A year later, when Unicode 2.0 was implemented, the concept of surrogate characters was introduced to go beyond the 16 bit limit.

Java 在内部以 UTF-16 格式表示所有 String.对于超过 U+FFFF 的代码点,代码点由代理对表示,即两个 chars,第一个是高代理代码单元,(在 uD800-uDBFF 范围内),第二个是低代理代码单元(在 uDC00-uDFFF 范围内).

Java internally represents all Strings in UTF-16 format. For code points exceeding U+FFFF the code point is represented by a surrogate pair, i.e., two chars with the first being the high-surrogates code unit, (in the range uD800-uDBFF), the second being the low-surrogate code unit (in the range uDC00-uDFFF).

从早期开始,所有基本的 Character 方法都基于这样一个假设:一个代码点可以用一个 char 表示,所以这就是方法签名的样子.我想保留在 Unicode 2.0 出现时没有改变的向后兼容性,并且在处理它们时需要谨慎.引用 Java 文档:

From the early days, all basic Character methods were based on the assumption that a code point could be represented in one char, so that's what the method signatures look like. I guess to preserve backward compatibility that was not changed when Unicode 2.0 came around and caution is needed when dealing with them. To quote from the Java documentation:

  • 仅接受 char 值的方法不支持补充字符.他们将代理范围中的 char 值视为未定义的字符.例如,Character.isLetter('uD840') 返回 false,即使此特定值后跟字符串中的任何低代理值将表示一个字母.
  • 接受 int 值的方法支持所有 Unicode 字符,包括增补字符.例如,Character.isLetter(0x2F81A) 返回 true,因为代码点值表示一个字母(CJK 表意文字).

char 转换为 int,就像您在示例中所做的那样,虽然效果很好.

Casting the char to an int, as you do in your sample, works fine though.

这篇关于将字符与代码点进行比较?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆