nvarchar(Unicode)列的COLLATIONS有什么意义? [英] What is the point of COLLATIONS for nvarchar (Unicode) columns?

查看:85
本文介绍了nvarchar(Unicode)列的COLLATIONS有什么意义?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

IVe对此有很多了解。

IVe read a lot about this.

还有一些问题:

我不是在谈论区分大小写此处...

Im not talking about case sensitive here...


  • 如果我有一个字符(ש例如),他存储在 nvarchar 中-可以容纳任何内容,为什么我在这里需要 collat​​ion

  • If I have a char (ש for example) and he is stored in nvarchar - which can hold anything , Why would I need collation here ?

如果我是 FaceBook,并且我需要能够存储个字符>所有语言,归类和我的nvarchar列之间是什么关系?

If I'm "FaceBook" and i need the ability to store all chars from all languages , What is the relationship between the collation and my nvarchar columns ?

预先感谢。

推荐答案

存储和表示字符是一回事,而知道如何排序和比较字符是另一回事。

Storing and representing characters is one thing, and knowing how to sort and compare them is another.

Unicode数据,存储在<$ c $ SQL Server中c> XML 和 N 前缀的类型可以表示所有语言的所有字符(在大多数情况下,这就是其目标) )和一个字符集。因此对于 NCHAR / NVARCHAR 数据(我省略了 NTEXT (因为它不再使用)和 XML (因为它不受归类影响),因此归类不会更改可以存储的字符。对于 CHAR VARCHAR 数据,排序规则 do 影响可以存储为每个排序规则的内容指向特定的代码页,该代码页确定可以存储在值128-255中的内容。

Unicode data, stored in the XML and N-prefixed types in SQL Server, can represent all characters in all languages (for the most part, and that is its goal) with a single character set. So for NCHAR / NVARCHAR data (I am leaving out NTEXT as it shouldn't be used anymore, and XML as it is not affected by Collations), the Collations do not change what characters can be stored. For CHAR and VARCHAR data, the Collations do affect what can be stored as each Collation points to a particular Code Page, which determines what can be stored in values 128 - 255.

现在,尽管所有字符都有默认的排序顺序,但不可能跨所有语言和文化开展工作。有许多语言共享某些/许多/所有字符,但是对于如何对它们进行排序有不同的规则。例如,在使用字母的大多数字母中,字母 C位于字母 D之前。在美国英语中, C和 H(即 CH作为两个单独的字母)的组合自然会出现在以 D开头的任何字符串之前。但是,在某些语言中, CH的两个字母的组合很特殊,并且在之后 D进行排序:

Now, while there is a default sort order for all characters, that cannot possibly work across all languages and cultures. There are many languages that share some / many / all characters, but have different rules for how to sort them. For example, the letter "C" comes before the letter "D" in most alphabets that use those letters. In US English, a combination of "C" and "H" (i.e. "CH" as two separate letters) would naturally come before any string starting with a "D". But, in a few languages, the two-letter combination of "CH" is special and sorts after "D":

IF (   N'CH' COLLATE Czech_CI_AI > N'D' COLLATE Czech_CI_AI
   AND N'C'  COLLATE Czech_CI_AI < N'D' COLLATE Czech_CI_AI
   AND N'CI' COLLATE Czech_CI_AI < N'D' COLLATE Czech_CI_AI
   ) PRINT 'Czech_CI_AI';

IF (   N'CH' COLLATE Czech_100_CI_AI > N'D' COLLATE Czech_100_CI_AI
   AND N'C'  COLLATE Czech_100_CI_AI < N'D' COLLATE Czech_100_CI_AI
   AND N'CI' COLLATE Czech_100_CI_AI < N'D' COLLATE Czech_100_CI_AI
   ) PRINT 'Czech_100_CI_AI';

IF (   N'CH' COLLATE Slovak_CI_AI > N'D' COLLATE Slovak_CI_AI
   AND N'C'  COLLATE Slovak_CI_AI < N'D' COLLATE Slovak_CI_AI
   AND N'CI' COLLATE Slovak_CI_AI < N'D' COLLATE Slovak_CI_AI
   ) PRINT 'Slovak_CI_AI';

IF (   N'CH' COLLATE Slovak_CS_AS > N'D' COLLATE Slovak_CS_AS
   AND N'C'  COLLATE Slovak_CS_AS < N'D' COLLATE Slovak_CS_AS
   AND N'CI' COLLATE Slovak_CS_AS < N'D' COLLATE Slovak_CS_AS
   ) PRINT 'Slovak_CS_AS';

IF (   N'CH' COLLATE Latin1_General_100_CI_AS > N'D' COLLATE Latin1_General_100_CI_AS
   AND N'C'  COLLATE Latin1_General_100_CI_AS < N'D' COLLATE Latin1_General_100_CI_AS
   AND N'CI' COLLATE Latin1_General_100_CI_AS < N'D' COLLATE Latin1_General_100_CI_AS
   ) PRINT 'Latin1_General_100_CI_AS'
ELSE PRINT 'Nope!';

返回值:

Czech_CI_AI
Czech_100_CI_AI
Slovak_CI_AI
Slovak_CS_AS
Nope!

要查看各种文化的排序规则示例,请参见:整理图表

To see examples of sorting rules across various cultures, please see: Collation Charts.

此外,某些语言字母或字母组合在大多数其他语言中与其他字母等效。例如,仅在丹麦语中,å等于 aa。但是,å不等于一个 a:

Also, in some languages certain letters or combinations of letters equate to other letters in ways that they do not in most other languages. For example, only in Danish does a "å" equate to "aa". But, the "å" does not equate to just a single "a":

IF (N'aa' COLLATE Danish_Greenlandic_100_CI_AI =  N'å' COLLATE Danish_Greenlandic_100_CI_AI
AND N'a'  COLLATE Danish_Greenlandic_100_CI_AI <> N'å' COLLATE Danish_Greenlandic_100_CI_AI
   ) PRINT 'Danish_Greenlandic_100_CI_AI';

IF (   N'aa' COLLATE Danish_Norwegian_CI_AI =  N'å' COLLATE Danish_Norwegian_CI_AI
   AND N'a'  COLLATE Danish_Norwegian_CI_AI <> N'å' COLLATE Danish_Norwegian_CI_AI
   ) PRINT 'Danish_Norwegian_CI_AI';

IF (   N'aa' COLLATE Latin1_General_100_CI_AI =  N'å' COLLATE Latin1_General_100_CI_AI
   AND N'a'  COLLATE Latin1_General_100_CI_AI <> N'å' COLLATE Latin1_General_100_CI_AI
   ) PRINT 'Latin1_General_100_CI_AI'
ELSE PRINT 'Nope!';

返回值:

Danish_Greenlandic_100_CI_AI
Danish_Norwegian_CI_AI
Nope!

这都是非常复杂的,我什至没有提到从右到左语言的处理(希伯来语和阿拉伯语),中文,日语,组合字符等。

This is all highly complex, and I haven't even mentioned handling for right-to-left languages (Hebrew and Arabic), Chinese, Japanese, combining characters, etc.

如果您想深入了解规则,请查看 Unicode排序算法(UCA)。上面的示例基于该文档中的示例,尽管我不相信UCA中的所有规则都已实现,尤其是自Windows归类(以开头的归类 not SQL _ )基于Unicode 5.0或6.0,具体取决于您所使用的操作系统以及所安装的.NET Framework的版本(请参阅 SortVersion 了解详情)。

If you want some deep insight into the rules, check out the Unicode Collation Algorithm (UCA). The examples above are based on examples in that documentation, though I do not believe all of the rules in the UCA have been implemented, especially since the Windows collations (collations not starting with SQL_) are based on Unicode 5.0 or 6.0, depending on the which OS you are using and the version of the .NET Framework that is installed (see SortVersion for details).

整理是做什么的。如果要查看所有可用的排序规则,只需运行以下命令:

So that is what the Collations do. If you want to see all of the Collations that are available, just run the following:

SELECT [name] FROM sys.fn_helpcollations() ORDER BY [name];

这篇关于nvarchar(Unicode)列的COLLATIONS有什么意义?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆