Collator比较奇怪的字符串 [英] Collator compares strings weird
问题描述
我有一组字符串,需要对其进行排序。我正在使用Collator。
但输出很奇怪。
I have a collection of strings and need to sort it. I'm using the Collator. But the output is weird.
final Collator collator = Collator.getInstance(Locale.US);
List<String> data = new ArrayList<String>();
data.add("1Z5800701_AB");
data.add("1Z5800701_AC");
data.add("1Z5800701-A");
data.add("1Z5800701 A");
data.add("1Z5800701B");
data.add("1Z5800701A");
data.add("1Z5800701 - A");
Collections.sort(data, new Comparator<String>() {
@Override
public int compare(String o1, String o2) {
return collator.compare(o1, o2);
}
});
for (String s : data) {
System.out.println(s);
}
输出为:
1Z5800701_AB
1Z5800701_AC
1Z5800701A
1Z5800701 A
1Z5800701 - A
1Z5800701-A
1Z5800701B
最后一个字符串'1Z5800701B'应位于'1Z5800701A'之后。我在这里缺少什么?
The last one string '1Z5800701B' should be after '1Z5800701A'. What am I missing here?
推荐答案
这是所使用的语言环境的问题,你可以在bash shell中重现相同的行为 LC_ALL = en_US sort
。关键是单词分隔符与本区域中的单词字符区别对待(即,您不能总是说字符X在字符B之前或之后排序 - 它取决于上下文)。结果是,如果您有 1Z5800701<可选分隔符> A
,它在之前排序1Z5800701<可选分隔符> B
,这就是为什么 1Z5800701B
在数字之后 A
的所有组合之后出现的原因,可选地由分隔符分隔。您还可以在此维基百科文章中查看更多不明显排序的示例
It's a matter of the locale used, you can reproduce the same behavior in the bash shell with LC_ALL=en_US sort
. The point is that the "word separators" are treated differently from "word characters" in this locale (i.e. you can't always say that character X sorts before or after character B - it depends on context). The result is if you have 1Z5800701 <optional separators> A
, it sorts before 1Z5800701 <optional separators> B
, that's why 1Z5800701B
comes after all combinations where the A
comes after the digits, optionally separated by "separators". You can also see some more examples of "not obvious" orderings in this Wikipedia articles
这篇关于Collator比较奇怪的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!