Java charAt用于具有两个代码单元的字符 [英] Java charAt used with characters that have two code units

查看:565
本文介绍了Java charAt用于具有两个代码单元的字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

来自 Core Java ,第一卷。 1,第9版,p。 69:

From Core Java, vol. 1, 9th ed., p. 69:


字符ℤ需要两个UTF-16编码的代码单元。调用

The character ℤ requires two code units in the UTF-16 encoding. Calling

String sentence = "ℤ is the set of integers"; // for clarity; not in book
char ch = sentence.charAt(1)

不返回空格但是第二个代码单位ℤ。

doesn't return a space but the second code unit of ℤ.

但似乎 sentence.charAt(1) 返回一个空格。例如,以下代码中的 if 语句的计算结果为 true

But it seems that sentence.charAt(1) does return a space. For example, the if statement in the following code evaluates to true.

String sentence = "ℤ is the set of integers";
if (sentence.charAt(1) == ' ')
    System.out.println("sentence.charAt(1) returns a space");

为什么?

我正在使用JDK SE 1.7.0_09在Ubuntu 12.10上,如果它是相关的。

I'm using JDK SE 1.7.0_09 on Ubuntu 12.10, if it's relevant.

推荐答案

听起来这本书说'ℤ'是不是基本多语言平面中的UTF-16字符,但实际上它是。

It sounds like tho book is saying that 'ℤ' is not a UTF-16 character in the basic multilingual plane, but in fact it is.

对于不在基本多语言平面中的字符,Java使用带有代理项对的UTF-16。由于'ℤ'(0x2124)在基本多语言平面中,因此它由单个代码单元表示。在您的示例中 sentence.charAt(0)将返回'ℤ',而 sentence.charAt(1)将返回''。

Java uses UTF-16 with surrogate pairs for characters that are not in the basic multilingual plane. Since 'ℤ' (0x2124) is in the basic multilingual plane it is represented by a single code unit. In your example sentence.charAt(0) will return 'ℤ', and sentence.charAt(1) will return ' '.

由代理对代表的字符有两个代码单元组成字符。 sentence.charAt(0)将返回第一个代码单元, sentence.charAt(1)将返回第二个代码单元。

A character represented by surrogate pairs has two code units making up the character. sentence.charAt(0) would return the first code unit, and sentence.charAt(1) would return the second code unit.

参见 http://docs.oracle.com/javase/6/docs/api/java/lang/String.html


String表示UTF-16格式的字符串,其中
补充字符由代理项对表示(请参阅字符类中的
部分Unicode字符表示形式,以获取
更多信息) 。索引值是指char代码单元,因此
补充字符在字符串中使用两个位置。

A String represents a string in the UTF-16 format in which supplementary characters are represented by surrogate pairs (see the section Unicode Character Representations in the Character class for more information). Index values refer to char code units, so a supplementary character uses two positions in a String.

这篇关于Java charAt用于具有两个代码单元的字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆