mysql utf8mb4_unicode_ci导致唯一键冲突 [英] mysql utf8mb4_unicode_ci cause unique key collision
问题描述
我有一张这样的桌子
CREATE TABLE `mb1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(30) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `u_name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
然后我插入两行
insert into mb1(name) values('K'),('K');
请注意,第二个K是Unicode字符
Note, the second K is unicode character
+------+-----------+
| name | hex(name) |
+------+-----------+
| K | 4B |
| K | EFBCAB |
+------+-----------+
为什么它们会导致唯一的按键碰撞? utf8mb4中的字符不是吗?
Why do they cause unique key collision? Aren't they different character in utf8mb4?
删除COLLATE utf8mb4_unicode_ci后,问题消失了.
After remove COLLATE utf8mb4_unicode_ci, the problem disappeared.
推荐答案
为什么它们会导致唯一的按键碰撞? utf8mb4中的字符不是吗?
Why do they cause unique key collision? Aren't they different character in utf8mb4?
您遗漏了关于CHARACTER SET
和COLLATION
的要点. CHARACTER SET
是不同字符的集合. COLLATION
表示是否将字符视为相等-将A
和a
视为不同的字符,但将ORDER BY
和WHERE =
等对待,以相同的方式对待.
You are missing the point about CHARACTER SET
and COLLATION
. A CHARACTER SET
is a collection of different characters. A COLLATION
says whether to treat the characters as equal -- think A
and a
-- different characters, but treated for ORDER BY
and WHERE =
, etc as being the same.
mysql> SELECT 'K'='K' COLLATE utf8_unicode_ci;
+-----------------------------------+
| 'K'='K' COLLATE utf8_unicode_ci |
+-----------------------------------+
| 1 |
+-----------------------------------+
因此,在utf8_unicode_ci(或utf8mb4_unicode_ci)中,这两个字符被视为相等".
So in utf8_unicode_ci (or utf8mb4_unicode_ci), those two characters are considered to be "equal".
等于"是对UNIQUE
键的测试.
将列的COLLATION
设置为对您而言有意义的任何内容.
Set the COLLATION
for the column to whatever makes sense for you.
- utf8mb4_unicode_ci 进行良好的现实生活"比较,显然包括此内容. K = k =Ķ=ķ
- utf8mb4_unicode_ci 进行更简单的比较.特别是没有2个字符的组合与1个字符的编码匹配.确实会发生外壳折叠和口音剥离. K = k =Ķ=ķ
- utf8mb4_bin 盲目检查这些位.没有大小写折叠等.K kĶ都是不相等的.
- utf8mb4_unicode_ci for good 'real life' comparisons, apparently including this one. K=k=Ķ=ķ
- utf8mb4_unicode_ci for more simple-minded comparisons. In particular no 2-character combinations match 1-character encodings. Case folding and accent stripping does occur. K=k=Ķ=ķ
- utf8mb4_bin blindly checks the bits. No case folding, etc. K k Ķ ķ are all unequal.
utf8mb4_latvian_ci有点不同:K = k但不等于Ķ=ķ.还有其他针对其他语言(主要是西欧语言)的专业归类.
utf8mb4_latvian_ci is a little different: K=k but not equal to Ķ=ķ . There are other specialized collations for other languages (mostly Western European).
您的K
被称为全拉丁字母大写字母K",因此将其与拉丁K
进行比较是相当合理的.
Your K
is called "FULLWIDTH LATIN CAPITAL LETTER K", so it is quite reasonable that it compare equal to Latin K
.
这篇关于mysql utf8mb4_unicode_ci导致唯一键冲突的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!