大写utf8字符与小写变体的字节数是否总是相同? [英] Are uppercase utf8 characters always the same number of bytes as their lowercase variants?

查看:154
本文介绍了大写utf8字符与小写变体的字节数是否总是相同?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于拉丁字母来说显然是这样.但是我是从概念上跨语言和Unicode规范提出这个问题的.

Obviously it is true for the latin alphabet. But I'm asking this in a conceptual sense, across languages and the Unicode spec.

实际上是为了比较两个字符串而提出的.如果您已经知道它们在所有语言中的字节数不是相同的—您能考虑到足够的保证以确保它们不是相同字符串的大小写"版本不同吗?

Practically this came up for comparing two strings. If you already know they aren't the same number of bytes—across all languages—can you consider that enough of a guarantee that they are not differently "cased" versions of the same string?

推荐答案

否.

考虑U + 0069"i",它在UTF-8中具有八位字节值69.以大写形式U + 0130İ",此代码点形成UTF-8序列C4 B0.

Consider U+0069 "i" which has the octet value 69 in UTF-8. In the uppercase form U+0130 "İ" this code point forms the UTF-8 sequence C4 B0.

强制性注释:区分大小写.

这篇关于大写utf8字符与小写变体的字节数是否总是相同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆