Collat​​or比较奇怪的字符串 [英] Collator compares strings weird

查看:70
本文介绍了Collat​​or比较奇怪的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一组字符串,需要对其进行排序。我正在使用Collat​​or。
但输出很奇怪。

I have a collection of strings and need to sort it. I'm using the Collator. But the output is weird.

final Collator collator = Collator.getInstance(Locale.US);

List<String> data = new ArrayList<String>();

data.add("1Z5800701_AB");
data.add("1Z5800701_AC");
data.add("1Z5800701-A");
data.add("1Z5800701 A");
data.add("1Z5800701B");
data.add("1Z5800701A");
data.add("1Z5800701 - A");

Collections.sort(data, new Comparator<String>() {

    @Override
    public int compare(String o1, String o2) {
        return collator.compare(o1, o2);
    }
});

for (String s : data) {
    System.out.println(s);
}

输出为:

1Z5800701_AB
1Z5800701_AC
1Z5800701A
1Z5800701 A
1Z5800701 - A
1Z5800701-A
1Z5800701B

最后一个字符串'1Z5800701B'应位于'1Z5800701A'之后。我在这里缺少什么?

The last one string '1Z5800701B' should be after '1Z5800701A'. What am I missing here?

推荐答案

这是所使用的语言环境的问题,你可以在bash shell中重现相同的行为 LC_ALL = en_US sort 。关键是单词分隔符与本区域中的单词字符区别对待(即,您不能总是说字符X在字符B之前或之后排序 - 它取决于上下文)。结果是,如果您有 1Z5800701<可选分隔符> A ,它在之前排序1Z5800701<可选分隔符> B ,这就是为什么 1Z5800701B 在数字之后 A 的所有组合之后出现的原因,可选地由分隔符分隔。您还可以在此维基百科文章中查看更多不明显排序的示例

It's a matter of the locale used, you can reproduce the same behavior in the bash shell with LC_ALL=en_US sort. The point is that the "word separators" are treated differently from "word characters" in this locale (i.e. you can't always say that character X sorts before or after character B - it depends on context). The result is if you have 1Z5800701 <optional separators> A, it sorts before 1Z5800701 <optional separators> B, that's why 1Z5800701B comes after all combinations where the A comes after the digits, optionally separated by "separators". You can also see some more examples of "not obvious" orderings in this Wikipedia articles

这篇关于Collat​​or比较奇怪的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆