澄清Java对Unicode的进化支持 [英] Clarifying Java's evolutionary support of Unicode

查看：113 发布时间：2018/12/19 22:34:42 java string unicode unicode-string

本文介绍了澄清Java对Unicode的进化支持的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我发现Java将char和codepoint区分为奇怪而且不合适。

I'm finding Java's differentiation of char and codepoint to be strange and out of place.

例如，字符串是一个字符数组或字母出现在字母表中;与可能是单个字母或可能是复合或代理对的代码点相反。但是，Java将字符串的字符定义为 char ，它不能复合或包含代码点的代理项和 int （这很好）。

For example, a string is an array of characters or "letters which appear in an alphabet"; in contrast to codepoint which MAY be a single letter or possibly a composite or surrogate pair. However, Java defines a character of a string as a char which cannot be composite or contain a surrogate the codepoint and as an int (this is fine).

但是 length（）似乎返回了代码点的数量，而 codePointCount（）还返回代码点的数量，但是却组合了复合字符..这最终不是真正的代码点数？

But then length() seems to return the number of codepoints while codePointCount() also returns the number of codepoints but instead combines composite characters.. which ends up not really being the real count of codepoints?

感觉好像 charAt（）应该返回一个 String ，以便复合和代理带来并且 length（）的结果应与 codePointCount（）交换。

It feels as though charAt() should return a String so that composites and surrogates are brought along and the result of length() should swap with codePointCount().

最初的实施感觉有点倒退。它的设计方式是否有原因？

The original implementation feels a little backwards. Is there a reason for the way it's designed the way it is?

更新： codePointAt（） ， codePointBefore（）

Update: codePointAt(), codePointBefore()

值得注意的是 codePointAt（） 和 codePointBefore（）接受索引作为参数，但是，索引作用于字符并且范围 0 到 length（） - 1 因此不是基于字符串中的代码点数量，正如人们可能假设的那样。

It's also worth noting that codePointAt() and codePointBefore() accept an index as a parameter, however, the index acts upon chars and has a range of 0 to length() - 1 and is therefore not based on the number of codepoints in the string, as one might assume.

更新： equalsIgnoreCase（）

Update: equalsIgnoreCase()

String.equalsIgnoreCase（）使用术语规范化来描述它在比较字符串之前的作用。这是一个误称，因为Unicode字符串上下文中的规范化可能意味着完全不同的东西。他们的意思是说他们使用大小写折叠。

String.equalsIgnoreCase() uses the term normalization to describe what it does prior to comparing strings. This is a misnomer as normalization in the context of a Unicode string can mean something entirely different. What they mean to say is that they use case-folding.

澄清Java对Unicode的进化支持 [英] Clarifying Java's evolutionary support of Unicode

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

澄清Java对Unicode的进化支持 [英] Clarifying Java&#39;s evolutionary support of Unicode

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

澄清Java对Unicode的进化支持 [英] Clarifying Java's evolutionary support of Unicode

登录关闭