如何在 Unicode 中将字符串设置为大写/小写? [英] How do you set strings to uppercase / lowercase in Unicode?

查看：47 发布时间：2021/8/30 20:41:42 unicode string theory low-level uppercase

本文介绍了如何在 Unicode 中将字符串设置为大写/小写?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这主要是一个我很好奇的理论问题.(我不是想通过自己编写代码或任何东西来做到这一点，我不是在重新发明轮子.)

This is mostly a theoretical question I'm just very curious about. (I'm not trying to do this by coding it myself or anything, I'm not reinventing wheels.)

我的问题是大写/小写等价表如何用于 Unicode.

My question is how the uppercase/lowercase table of equivalence works for Unicode.

例如，如果我必须在 ASCII 中执行此操作，我会取一个字符，如果它落在 [a-z] 范围内，我会总结 A 和 a 之间的差异.

For example, if I had to do this in ASCII, I'd take a character, and if it falls withing the [a-z] range, I'd sum the difference between A and a.

如果它不在那个范围内，我会为 10 个左右的重音字符加上 ñ 准备一个小的等价表.(或者，我可以拥有一个包含 256 个条目的完整等价数组，其中大部分与输入相同)

If it doesn't fall on that range, I'd have a small equivalence table for the 10 or so accented characters plus ñ. (Or, I could just have a full equivalence array with 256 entries, most of which would be the same as the input)

然而，我猜想有一种更好的方法来指定 Unicode 中的等价物，因为有成百上千的字符，而且理论上可以添加一种新的语言或一组字符(而且我期望发生这种情况时您不需要修补窗口).

However, I'm guessing that there's a better way of specifying the equivalences in Unicode, given that there are hundreds of thousands of characters, and that theoretically, a new language or set of characters can be added (and I'm expecting that you wouldn't need to patch windows when that happens).

Windows 是否为每个字符都有一个巨大的硬编码等价表?或者这是如何实现的?

Does Windows have a huge hard-coded equivalence table for each character? Or how is this implemented?

一个相关的问题是 SQL Server 如何实现基于 Unicode 的不区分重音和不区分大小写的查询.是否有内部表告诉它 é ë è E É È 和 Ë 都等价于e"?

A related question is how SQL Server implements Unicode-based accent-insensitive and case-insensitive queries. Does it have an internal table that tells it that é ë è E É È and Ë are all equivalent to "e"?

在比较字符串时，这听起来不是很快.

That doesn't sound very fast when it comes to comparing strings.

它如何快速访问索引?它是否已经将值转换为它们的基本"字符，对应于该字段的排序规则?

How does it access Indexes quickly? Does it already index values converted to their "base" characters, corresponding to that field's collation?

有人知道这些东西的内部结构吗?

Does anyone know the internals for these things?

谢谢！

如何在 Unicode 中将字符串设置为大写/小写? [英] How do you set strings to uppercase / lowercase in Unicode?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在 Unicode 中将字符串设置为大写/小写? [英] How do you set strings to uppercase / lowercase in Unicode?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭