如何在 Unicode 中将字符串设置为大写/小写? [英] How do you set strings to uppercase / lowercase in Unicode?

查看:47
本文介绍了如何在 Unicode 中将字符串设置为大写/小写?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这主要是一个我很好奇的理论问题.(我不是想通过自己编写代码或任何东西来做到这一点,我不是在重新发明轮子.)

This is mostly a theoretical question I'm just very curious about. (I'm not trying to do this by coding it myself or anything, I'm not reinventing wheels.)

我的问题是大写/小写等价表如何用于 Unicode.

My question is how the uppercase/lowercase table of equivalence works for Unicode.

例如,如果我必须在 ASCII 中执行此操作,我会取一个字符,如果它落在 [a-z] 范围内,我会总结 A 和 a 之间的差异.

For example, if I had to do this in ASCII, I'd take a character, and if it falls withing the [a-z] range, I'd sum the difference between A and a.

如果它不在那个范围内,我会为 10 个左右的重音字符加上 ñ 准备一个小的等价表.(或者,我可以拥有一个包含 256 个条目的完整等价数组,其中大部分与输入相同)

If it doesn't fall on that range, I'd have a small equivalence table for the 10 or so accented characters plus ñ. (Or, I could just have a full equivalence array with 256 entries, most of which would be the same as the input)

然而,我猜想有一种更好的方法来指定 Unicode 中的等价物,因为有成百上千的字符,而且理论上可以添加一种新的语言或一组字符(而且我期望发生这种情况时您不需要修补窗口).

However, I'm guessing that there's a better way of specifying the equivalences in Unicode, given that there are hundreds of thousands of characters, and that theoretically, a new language or set of characters can be added (and I'm expecting that you wouldn't need to patch windows when that happens).

Windows 是否为每个字符都有一个巨大的硬编码等价表?或者这是如何实现的?

Does Windows have a huge hard-coded equivalence table for each character? Or how is this implemented?

一个相关的问题是 SQL Server 如何实现基于 Unicode 的不区分重音和不区分大小写的查询.是否有内部表告诉它 é ë è E É È 和 Ë 都等价于e"?

A related question is how SQL Server implements Unicode-based accent-insensitive and case-insensitive queries. Does it have an internal table that tells it that é ë è E É È and Ë are all equivalent to "e"?

在比较字符串时,这听起来不是很快.

That doesn't sound very fast when it comes to comparing strings.

它如何快速访问索引?它是否已经将值转换为它们的基本"字符,对应于该字段的排序规则?

How does it access Indexes quickly? Does it already index values converted to their "base" characters, corresponding to that field's collation?

有人知道这些东西的内部结构吗?

Does anyone know the internals for these things?

谢谢!

推荐答案

有一个映射文件,其中包含所有具有 1:1 映射比率的案例映射.通常操作系统/框架/库支持特定版本的 Unicode,并且由于此 case 映射文件是版本化的,因此您将获得特定操作系统/框架/库/任何支持的 Unicode 版本的映射.

There is a mapping file that contains all the case mappings that have a 1:1 mapping ratio. Usually operating systems/frameworks/libraries support a specific version of Unicode, and since this case mappings file is versioned, you would get the mappings for whichever version of Unicode your particular OS/framework/library/whatever happened to support.

有关 Unicode 大小写映射的更多信息,请参阅:http://www.unicode.org/常见问题解答/casemap_charprop.html

For more information on Unicode case mappings, see: http://www.unicode.org/faq/casemap_charprop.html

这篇关于如何在 Unicode 中将字符串设置为大写/小写?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆