mysql utf8mb4_unicode_ci导致唯一键冲突 [英] mysql utf8mb4_unicode_ci cause unique key collision

查看:422
本文介绍了mysql utf8mb4_unicode_ci导致唯一键冲突的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一张这样的桌子

CREATE TABLE `mb1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(30) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `u_name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

然后我插入两行

insert into mb1(name) values('K'),('K');

请注意,第二个K是Unicode字符

Note, the second K is unicode character

+------+-----------+
| name | hex(name) |
+------+-----------+
| K    | 4B        |
| K   | EFBCAB    |
+------+-----------+

为什么它们会导致唯一的按键碰撞? utf8mb4中的字符不是吗?

Why do they cause unique key collision? Aren't they different character in utf8mb4?

删除COLLATE utf8mb4_unicode_ci后,问题消失了.

After remove COLLATE utf8mb4_unicode_ci, the problem disappeared.

推荐答案

为什么它们会导致唯一的按键碰撞? utf8mb4中的字符不是吗?

Why do they cause unique key collision? Aren't they different character in utf8mb4?

您遗漏了关于CHARACTER SETCOLLATION的要点. CHARACTER SET是不同字符的集合. COLLATION表示是否将字符视为相等-将Aa视为不同的字符,但将ORDER BYWHERE =等对待,以相同的方式对待.

You are missing the point about CHARACTER SET and COLLATION. A CHARACTER SET is a collection of different characters. A COLLATION says whether to treat the characters as equal -- think A and a -- different characters, but treated for ORDER BY and WHERE =, etc as being the same.

mysql> SELECT 'K'='K' COLLATE utf8_unicode_ci;
+-----------------------------------+
| 'K'='K' COLLATE utf8_unicode_ci  |
+-----------------------------------+
|                                 1 |
+-----------------------------------+

因此,在utf8_unicode_ci(或utf8mb4_unicode_ci)中,这两个字符被视为相等".

So in utf8_unicode_ci (or utf8mb4_unicode_ci), those two characters are considered to be "equal".

等于"是对UNIQUE键的测试.

将列的COLLATION设置为对您而言有意义的任何内容.

Set the COLLATION for the column to whatever makes sense for you.

  • utf8mb4_unicode_ci 进行良好的现实生活"比较,显然包括此内容. K = k =Ķ=ķ
  • utf8mb4_unicode_ci 进行更简单的比较.特别是没有2个字符的组合与1个字符的编码匹配.确实会发生外壳折叠和口音剥离. K = k =Ķ=ķ
  • utf8mb4_bin 盲目检查这些位.没有大小写折叠等.K kĶ都是不相等的.
  • utf8mb4_unicode_ci for good 'real life' comparisons, apparently including this one. K=k=Ķ=ķ
  • utf8mb4_unicode_ci for more simple-minded comparisons. In particular no 2-character combinations match 1-character encodings. Case folding and accent stripping does occur. K=k=Ķ=ķ
  • utf8mb4_bin blindly checks the bits. No case folding, etc. K k Ķ ķ are all unequal.

utf8mb4_latvian_ci有点不同:K = k但不等于Ķ=ķ.还有其他针对其他语言(主要是西欧语言)的专业归类.

utf8mb4_latvian_ci is a little different: K=k but not equal to Ķ=ķ . There are other specialized collations for other languages (mostly Western European).

您的被称为全拉丁字母大写字母K",因此将其与拉丁K进行比较是相当合理的.

Your is called "FULLWIDTH LATIN CAPITAL LETTER K", so it is quite reasonable that it compare equal to Latin K.

这篇关于mysql utf8mb4_unicode_ci导致唯一键冲突的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆