具有不对称大写/小写的 Unicode 字符.为什么? [英] Unicode characters having asymmetric upper/lower case. Why?

查看:28
本文介绍了具有不对称大写/小写的 Unicode 字符.为什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为什么下面三个字符没有对称toLowertoUpper结果

Why do the following three characters have not symmetric toLower, toUpper results

/**
  * Written in the Scala programming language, typed into the Scala REPL.
  * Results commented accordingly.
  */
/* Unicode Character 'LATIN CAPITAL LETTER SHARP S' (U+1E9E) */
'\u1e9e'.toHexString == "1e9e" // true
'\u1e9e'.toLower.toHexString == "df" // "df" == "df"
'\u1e9e'.toHexString == '\u1e9e'.toLower.toUpper.toHexString // "1e9e" != "df"
/* Unicode Character 'KELVIN SIGN' (U+212A) */
'\u212a'.toHexString == "212a" // "212a" == "212a"
'\u212a'.toLower.toHexString == "6b" // "6b" == "6b"
'\u212a'.toHexString == '\u212a'.toLower.toUpper.toHexString // "212a" != "4b"
/* Unicode Character 'LATIN CAPITAL LETTER I WITH DOT ABOVE' (U+0130) */
'\u0130'.toHexString == "130" // "130" == "130"
'\u0130'.toLower.toHexString == "69" // "69" == "69"
'\u0130'.toHexString == '\u0130'.toLower.toUpper.toHexString // "130" != "49"

推荐答案

对于第一个,有 这个解释:

在德语中,Sharp S(ß"或 U+00df)是一个小写字母,它大写为字母SS".

In the German language, the Sharp S ("ß" or U+00df) is a lowercase letter, and it capitalizes to the letters "SS".

换句话说,U+1E9E 小写为 U+00DF,但 U+00DF 的大写不是 U+1E9E.

In other words, U+1E9E lower-cases to U+00DF, but the upper-case of U+00DF is not U+1E9E.

对于第二个,U+212A(KELVIN SIGN)小写为 U+0068(拉丁文小写字母 K).U+0068 的大写字母是 U+004B(拉丁文大写字母 K).这个对我来说似乎很有意义.

For the second one, U+212A (KELVIN SIGN) lower-cases to U+0068 (LATIN SMALL LETTER K). The upper-case of U+0068 is U+004B (LATIN CAPITAL LETTER K). This one seems to make sense to me.

对于第三种情况,U+0130(带点的拉丁文大写字母 I)是土耳其语/阿塞拜疆语字符,它小写为 U+0069(拉丁文小写字母 I).我想如果您以某种方式处于土耳其/阿塞拜疆语言环境中,您会得到正确的大写 U+0069 版本,但这可能不一定是通用的.

For the third case, U+0130 (LATIN CAPITAL LETTER I WITH DOT ABOVE) is a Turkish/Azerbaijani character that lower-cases to U+0069 (LATIN SMALL LETTER I). I would imagine that if you were somehow in a Turkish/Azerbaijani locale you'd get the proper upper-case version of U+0069, but that might not necessarily be universal.

字符不一定需要对称的大小写转换.

Characters need not necessarily have symmetric upper- and lower-case transformations.

为了回应 PhiLho 在下面的评论,Unicode 6.0 规范 关于 U+212A (KELVIN SIGN) 是这样说的:

To respond to PhiLho's comment below, the Unicode 6.0 spec has this to say about U+212A (KELVIN SIGN):

三个类似字母的符号已被赋予与普通字母的规范等效:U+2126OHM 标志、U+212A 凯尔文标志和 U+212B ANGSTROM 标志.在所有三种情况下,都应使用常规字母.如果根据 Unicode 标准附件 #15Unicode 规范化形式"对文本进行规范化,则这三个字符将被替换为它们的常规等效项.

Three letterlike symbols have been given canonical equivalence to regular letters: U+2126 OHM SIGN, U+212A KELVIN SIGN, and U+212B ANGSTROM SIGN. In all three instances, the regular letter should be used. If text is normalized according to Unicode Standard Annex #15, "Unicode Normalization Forms," these three characters will be replaced by their regular equivalents.

换句话说,你不应该真的使用 U+212A,你应该使用 U+004B(拉丁文大写字母 K),如果你规范化你的 Unicode 文本,U+212A 应该替换为 U+004B.

In other words, you shouldn't really be using U+212A, you should be using U+004B (LATIN CAPITAL LETTER K) instead, and if you normalize your Unicode text, U+212A should be replaced with U+004B.

这篇关于具有不对称大写/小写的 Unicode 字符.为什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆