String.equalsIgnoreCase-UpperCase诉LowerCase [英] String.equalsIgnoreCase - UpperCase v. LowerCase

查看:139
本文介绍了String.equalsIgnoreCase-UpperCase诉LowerCase的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在浏览openjdk时,发现String.equalsIgnoreCase中有一个奇怪的代码路径,特别是方法 regionMatches

I was browsing through the openjdk and noticed a weird code path in String.equalsIgnoreCase, specifically the method regionMatches:

if (ignoreCase) {
    // If characters don't match but case may be ignored,
    // try converting both characters to uppercase.
    // If the results match, then the comparison scan should
    // continue.
    char u1 = Character.toUpperCase(c1);
    char u2 = Character.toUpperCase(c2);
    if (u1 == u2) {
        continue;
    }
    // Unfortunately, conversion to uppercase does not work properly
    // for the Georgian alphabet, which has strange rules about case
    // conversion.  So we need to make one last check before
    // exiting.
    if (Character.toLowerCase(u1) == Character.toLowerCase(u2)) {
        continue;
    }
}

我了解有关针对特定字母进行调整的评论检查小写是否相等,但想知道为什么还要检查大写?为什么不只是全部使用小写字母呢?

I understand the comment about adjusting for a specific alphabet to check the lower case equality, but was wondering why even have the upper case check? Why not just do all lower case?

推荐答案

现在重新打开了问题,我将答案转移到这里。

Now that the question is re-opened, I transfer my answer here.

如果它们比大写字母匹配更多的情况,为什么它们不只比较小写字母而不是大写和小写字母?的简短答案:不匹配更多个字符对,它仅匹配个不同的对。

The short answer to "Why do they not just compare only lowercase instead of both upper and lower case, if it matches more cases than uppercase?": It does not match more character pairs, it merely matches different pairs.

仅比较大写是不够的,例如ASCII字母 I和带点号İ的大写字母I((char)304 ,在土耳其语字母中使用)具有不同的大写字母(它们已经是大写字母),但是它们具有相同的小写字母 i。 (请注意,土耳其语将带点号的i和不带点号的i视为不同的字母,而不仅仅是带重音的字母,类似于德语中的Umlautsä/ö/ü与a / o / u。)

Comparing only uppercase is not enough, e.g. the ASCII letter "I" and the capital I with dot "İ" ((char)304, used in Turkish alphabet) have different uppercase (they are already uppercase), but they have the same lowercase letter "i". (Note that the Turkish language considers i with dot and i without dot as different letters, not just an accented letter, similar to German with its Umlauts ä/ö/ü vs. a/o/u.)

仅比较小写是不够的,例如ASCII字母 i和小的无点号iı((char)305 )。它们具有不同的小写字母(它们已经是小写字母),但是它们具有相同的大写字母 I。

Comparing only lowercase is not enough, e.g. the ASCII letter "i" and the small dotless i "ı" ((char)305). They have different lowercase (they are already lowercase), but they have the same uppercase letter "I".

最后,将小写的I与点号İ进行比较无点我ı。它们的大写字母(İ与 I)或小写字母( i与ı)都不匹配,但是它们的大写字母的小写字母相同( I)。我在希腊字母 if和 ϑ(字符1012和977)中发现了这种情况的另一种情况。

And finally, compare capital I with dot "İ" with small dotless i "ı". Neither their uppercases ("İ" vs. "I") nor their lowercases ("i" vs. "ı") match, but the lowercase of their uppercase is the same ("I"). I found another case if this phenomenon, in the greek letters "ϴ" and "ϑ" (char 1012 and 977).

因此,不区分大小写的真实比较甚至不能检查原始字符的大写和小写字母,但必须检查大写字母的小写字母。

So a true case insensitive comparison can not even check uppercases and lowercases of the original characters, but must check the lowercases of the uppercases.

这篇关于String.equalsIgnoreCase-UpperCase诉LowerCase的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆