ICU4J为什么不匹配UTF-8排序顺序? [英] Why doesn't ICU4J match UTF-8 sort order?

查看：129 发布时间：2020/7/13 3:43:28 unicode utf-8 icu4j

本文介绍了ICU4J为什么不匹配UTF-8排序顺序?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我很难理解unicode的排序顺序.

I am having a hard time understanding unicode sorting order.

当我在ICU4J 55.1下运行Collator.getInstance(Locale.ENGLISH).compare("_", "#")时，得到的返回值-1表示_在#之前.

When I run Collator.getInstance(Locale.ENGLISH).compare("_", "#") under ICU4J 55.1 I get a return value of -1 indicating that _ comes before #.

但是，请查看 http://www.utf8- chartable.de/unicode-utf8-table.pl?utf8=dec 我看到#(U + 0023)在_(U + 005F)之前.为什么ICU4J返回-1的值?

However, looking at http://www.utf8-chartable.de/unicode-utf8-table.pl?utf8=dec I see that # (U+0023) comes before _ (U+005F). Why is ICU4J returning a value of -1?

推荐答案

首先，UTF-8只是一种编码.它指定了如何物理存储Unicode代码点，但不处理排序，比较等.

First, UTF-8 is just an encoding. It specifies how to store the Unicode code points physically, but does not handle sorting, comparisons, etc.

现在，您链接到的页面将以数字代码点顺序显示所有内容.如果使用二进制排序规则，则按此顺序排序(在SQL Server中，排序规则的名称以_BIN和_BIN2结尾).但是非二进制排序要复杂得多.规则在此处描述: Unicode排序算法(UCA).

Now, the page you linked to shows everything in numerical Code Point order. That is the order things would sort in if using a binary collation (in SQL Server, that would be collations with names ending in _BIN and _BIN2). But the non-binary ordering is far more complex. The rules are described here: Unicode Collation Algorithm (UCA).

可在此处找到基本规则: http ://www.unicode.org/repos/cldr/tags/release-28/common/uca/allkeys_CLDR.txt

The base rules are found here: http://www.unicode.org/repos/cldr/tags/release-28/common/uca/allkeys_CLDR.txt

它显示:

005F  ; [*010A.0020.0002] # LOW LINE
...
0023  ; [*0290.0020.0002] # NUMBER SIGN

请记住，任何语言环境/文化都可以覆盖这些基本规则，这一点非常重要.因此，尽管上面提到的几行解释了这种特定情况，但其他情况则需要检查 http://www.unicode.org/repos/cldr/tags/release-28/common/collation/来查看是否存在任何特定于语言环境的替代.

It is very important to keep in mind that any locale / culture can override these base rules. Hence, while the few lines noted above explain this specific circumstance, other circumstances would need to check http://www.unicode.org/repos/cldr/tags/release-28/common/collation/ to see if there are any locale-specific overrides.

这篇关于ICU4J为什么不匹配UTF-8排序顺序?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

ICU4J为什么不匹配UTF-8排序顺序? [英] Why doesn't ICU4J match UTF-8 sort order?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

ICU4J为什么不匹配UTF-8排序顺序? [英] Why doesn&#39;t ICU4J match UTF-8 sort order?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

ICU4J为什么不匹配UTF-8排序顺序? [英] Why doesn't ICU4J match UTF-8 sort order?

登录关闭