将char与代码点进行比较? [英] Comparing a char to a code-point?

查看:97
本文介绍了将char与代码点进行比较?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

将代码点与Java角色进行比较的正确方法是什么?例如:

What is the "correct" way of comparing a code-point to a Java character? For example:

int codepoint = String.codePointAt(0);
char token = '\n';

我知道我可能会这样做:

I know I can probably do:

if (codepoint==(int) token)
{ ... }

但这段代码看起来很脆弱。是否有正式的API方法将代码点字符进行比较,或转换 char 最多代码点进行比较?

but this code looks fragile. Is there a formal API method for comparing codepoints to chars, or converting the char up to a codepoint for comparison?

推荐答案

一点背景:当Java出现在1995年时, char 类型基于原始的 Unicode 88 规范,限制为16位。一年后,当实现Unicode 2.0时,引入了代理字符的概念超出了16位的限制。

A little bit of background: When Java appeared in 1995, the char type was based on the original "Unicode 88" specification, which was limited to 16 bits. A year later, when Unicode 2.0 was implemented, the concept of surrogate characters was introduced to go beyond the 16 bit limit.

Java内部代表所有字符串以UTF-16格式。对于超过U + FFFF的代码点,代码点由代理对表示,即两个 char s,第一个是高代理代码单元,(在范围内) \\\�-\ uDBFF),第二个是低代理代码单元(在\ uDC00-\\\ 00DFFF范围内)。

Java internally represents all Strings in UTF-16 format. For code points exceeding U+FFFF the code point is represented by a surrogate pair, i.e., two chars with the first being the high-surrogates code unit, (in the range \uD800-\uDBFF), the second being the low-surrogate code unit (in the range \uDC00-\uDFFF).

从早期开始,所有基本的字符方法都是基于一个代码点可以在一个 char 中表示的假设,这就是方法签名看起来像。我想保留当Unicode 2.0出现时没有改变的向后兼容性,并且在处理它们时需要谨慎。引用 Java文档

From the early days, all basic Character methods were based on the assumption that a code point could be represented in one char, so that's what the method signatures look like. I guess to preserve backward compatibility that was not changed when Unicode 2.0 came around and caution is needed when dealing with them. To quote from the Java documentation:


  • 只接受char值的方法不支持增补字符。它们将代理范围中的char值视为未定义的字符。例如,Character.isLetter('\ uD840')返回false,即使后面跟着字符串中任何低代理值的特定值也代表一个字母。

  • 方法接受int值支持所有Unicode字符,包括补充字符。例如,Character.isLetter(0x2F81A)返回true,因为代码点值表示一个字母(一个CJK表意文字)。

Casting the char int ,就像在样本中一样,但工作正常。

Casting the char to an int, as you do in your sample, works fine though.

这篇关于将char与代码点进行比较?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆