哪种MySQL排序规则最适合接受所有unicode字符? [英] What MySQL collation is best for accepting all unicode characters?
问题描述
我们的专栏目前整理为latin1_swedish_ci
,显然,特殊的Unicode字符已被删除.我们希望能够接受诸如U+272A ✪
,U+2764 ❤
之类的字符(请参见此Wikipedia文章)等.我倾向于utf8_unicode_ci
,此排序规则可以处理这些字符和其他字符吗?我不在乎速度,因为此列不是索引.
Our column is currently collated to latin1_swedish_ci
and special unicode characters are, obviously, getting stripped out. We want to be able to accept chars such as U+272A ✪
, U+2764 ❤
, (see this wikipedia article) etc. I'm leaning towards utf8_unicode_ci
, would this collation handle these and other characters? I don't care about speed as this column isn't an index.
MySQL版本:5.5.28-1
MySQL Version: 5.5.28-1
推荐答案
排序规则是您最少的担心,您需要考虑的是字符集用于列/表/数据库.排序规则(控制数据比较和排序的规则)只是推论.
The collation is the least of your worries, what you need to think about is the character set for the column/table/database. The collation (rules governing how data is compared and sorted) is just a corollary of that.
MySQL支持几种Unicode字符集,其中最有趣的是utf8
和utf8mb4
. utf8
支持 BMP 中的Unicode字符,即所有Unicode的子集.从MySQL 5.5.3开始提供的utf8mb4
支持Unicode的 all .
MySQL supports several Unicode character sets, utf8
and utf8mb4
being the most interesting. utf8
supports Unicode characters in the BMP, i.e. a subset of all of Unicode. utf8mb4
, available since MySQL 5.5.3, supports all of Unicode.
与任何Unicode编码一起使用的归类很可能是xxx_general_ci
或xxx_unicode_ci
.前者是一种独立于语言的常规排序和比较算法,后者是一种更完整与语言无关的算法,支持更多的Unicode功能(例如,将ß"和"ss"视为等效),但是因此也比较慢.
The collation to be used with any of the Unicode encodings is most likely xxx_general_ci
or xxx_unicode_ci
. The former is a general sorting and comparison algorithm independent of language, the latter is a more complete language independent algorithm supporting more Unicode features (e.g. treating "ß" and "ss" as equivalent), but is therefore also slower.
请参见 https://dev.mysql.com/doc/refman/5.5/zh-CN/charset-unicode-sets.html .
这篇关于哪种MySQL排序规则最适合接受所有unicode字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!