字符集和排序规则是什么意思? [英] What does character set and collation mean exactly?

查看:546
本文介绍了字符集和排序规则是什么意思?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我可以阅读mysql文档,他们很清楚。但是,如何决定使用哪个字符集?在什么东西上排序规则有效果?

I can read the mysql documetations and they are pretty clear. But, how does one decide which character set to use? On what stuff does collation have an effect?

我要求对两者的解释以及如何选择它们。

I'm asking for an explanation of the two and how to choose them.

推荐答案

从MySQL docs


字符集是一组符号
编码。 整理是一组
规则,用于比较
字符集中的字符。让我们用
的一个假想字符集为例来说明
的区别。

A character set is a set of symbols and encodings. A collation is a set of rules for comparing characters in a character set. Let's make the distinction clear with an example of an imaginary character set.

假设我们有一个字母与
四个字母:'A','B','a','b'。 We
给每个字母一个数字:'A'= 0,
'B'= 1,'a'= 2,'b'= 3.字母
'A'符号,数字0是用于'A'的
编码,并且所有四个字母的组合
和它们的
编码是一个字符集。

Suppose that we have an alphabet with four letters: 'A', 'B', 'a', 'b'. We give each letter a number: 'A' = 0, 'B' = 1, 'a' = 2, 'b' = 3. The letter 'A' is a symbol, the number 0 is the encoding for 'A', and the combination of all four letters and their encodings is a character set.

现在,假设我们要比较
两个字符串值A和B。
最简单的方法是查看
的编码:0代表'A',1代表
'B'。因为0小于1,我们说
'A'小于'B'。现在,我们刚刚完成的
是对我们的
字符集应用排序规则。排序规则是一组
规则(在这种情况下只有一个规则):
比较编码。我们将这个
称为所有可能的排序规则a
的二进制排序规则。

Now, suppose that we want to compare two string values, 'A' and 'B'. The simplest way to do this is to look at the encodings: 0 for 'A' and 1 for 'B'. Because 0 is less than 1, we say 'A' is less than 'B'. Now, what we've just done is apply a collation to our character set. The collation is a set of rules (only one rule in this case): "compare the encodings." We call this simplest of all possible collations a binary collation.

但是如果我们想说的是
小写和大写字母是
等价?然后我们将有
至少两个规则:(1)将
小写字母'a'和'b'视为
,相当于'A'和'B'; (2)然后
比较编码。我们称之为
不区分大小写的排序规则。

But what if we want to say that the lowercase and uppercase letters are equivalent? Then we would have at least two rules: (1) treat the lowercase letters 'a' and 'b' as equivalent to 'A' and 'B'; (2) then compare the encodings. We call this a case-insensitive collation. It's a little more complex than a binary collation.

在现实生活中,大多数字符集都有
多个字符:不只是'A'和'B'
但是整个字母,有时
多个字母或东方书写
系统有数千个字符,
以及许多特殊符号和
标点符号分数。在现实生活中,
大多数整理有许多规则:不是
只是不区分大小写,而且
重音不敏感(一个重音是一个
标记附加到一个字符作为
德语'ö')和多字符
映射(例如两个德语
排序中的一个中的ö=
'OE'的规则)。

In real life, most character sets have many characters: not just 'A' and 'B' but whole alphabets, sometimes multiple alphabets or eastern writing systems with thousands of characters, along with many special symbols and punctuation marks. Also in real life, most collations have many rules: not just case insensitivity but also accent insensitivity (an "accent" is a mark attached to a character as in German 'ö') and multiple-character mappings (such as the rule that 'ö' = 'OE' in one of the two German collations).

这篇关于字符集和排序规则是什么意思?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆