对CaseInsensitiveComparator的实现感到好奇 [英] Curious about the implementation of CaseInsensitiveComparator

查看:224
本文介绍了对CaseInsensitiveComparator的实现感到好奇的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我检查 CaseInsensitiveComparator 的实现,这是 String 的私有内部类时,我发现了奇怪的事情。

While I check the implementation of CaseInsensitiveComparator, which is private inner class of String, I found strange thing.

private static class CaseInsensitiveComparator
        implements Comparator<String>, java.io.Serializable {
    ...
    public int compare(String s1, String s2) {
        int n1 = s1.length();
        int n2 = s2.length();
        int min = Math.min(n1, n2);
        for (int i = 0; i < min; i++) {
            char c1 = s1.charAt(i);
            char c2 = s2.charAt(i);
            if (c1 != c2) {
                c1 = Character.toUpperCase(c1);
                c2 = Character.toUpperCase(c2);
                if (c1 != c2) {
                    c1 = Character.toLowerCase(c1);
                    c2 = Character.toLowerCase(c2);
                    if (c1 != c2) {
                        // No overflow because of numeric promotion
                        return c1 - c2;
                    }
                }
            }
        }
        return n1 - n2;
    }
    ...
}

我在做什么好奇是这样的:在 for 循环中,一旦你比较上面的套管字符,你为什么要再次比较下面的套管字符呢?当 Character.toUpperCase(c1) Character.toUpperCase(c2)不同时,是否可能 Character.toLowerCase(c1) Character.toLowerCase(c2)是否相等?

What I'm curious is this: In the for loop, once you compare the upper cased characters, why you should compare the lower cased characters again? When Character.toUpperCase(c1) and Character.toUpperCase(c2) are different, is it possible that Character.toLowerCase(c1) and Character.toLowerCase(c2) are equal?

难道不能像这样简化吗?

Couldn't it be simplified like this?

public int compare(String s1, String s2) {
    int n1 = s1.length();
    int n2 = s2.length();
    int min = Math.min(n1, n2);
    for (int i = 0; i < min; i++) {
        char c1 = s1.charAt(i);
        char c2 = s2.charAt(i);
        if (c1 != c2) {
            c1 = Character.toUpperCase(c1);
            c2 = Character.toUpperCase(c2);
            if (c1 != c2) {
                // No overflow because of numeric promotion
                return c1 - c2;
            }
        }
    }
    return n1 - n2;
}

我错过了什么吗?

推荐答案

Unicode字符的大小写不同,但大写形式相同。例如希腊字母Sigma - 它有两个小写形式(σ,ς仅用于单词的末尾),但只有一个大写形式(Σ)。

There are Unicode characters which are different in lowercase, but have the same uppercase form. For example the Greek letter Sigma - it has two lowercase forms (σ, and ς which is only used at the end of the word), but only one uppercase form (Σ).

我找不到任何相反的例子,但是如果将来发生这样的情况,那么当前的Java实现已经为此做好了准备。您的 Comparator 版本肯定会正确处理Sigma案例。

I could not find any examples of the reverse, but if such a situation happened in the future, the current Java implementation is already prepared for this. Your version of the Comparator would definitely handle the Sigma case correctly.

您可以在案例映射常见问题解答。

这篇关于对CaseInsensitiveComparator的实现感到好奇的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆