MySQL中的UTF8字符串比较 [英] UTF8 string comparisons in MySQL

查看:152
本文介绍了MySQL中的UTF8字符串比较的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在MySQL 5中,关于大小写和重音,我们在utf8字符串比较方面存在问题:

We have issues with utf8-string comparisons in MySQL 5, regarding case and accents :

从我收集到的信息来看,MySQL通过考虑应将字符组视为相等"来实现归类.

from what I gathered, what MySQL implements collations by considering that "groups of characters should be considered equal".

例如,在utf8_unicode_ci排序规则中,所有字母EÉÈÊeéèê"都在同一框中(以及"e"的其他变体).

For example, in the utf8_unicode_ci collation, all the letters "EÉÈÊeéèê" are in the same box (together with other variants of "e").

因此,如果您有一个包含["video",vidéo",vidÉo",vidÊo",vidêo",vidÈo",vidèo","vidEo"]的表, ut8_general_ci排序规则):

So if you have a table containing ["video", "vidéo", "vidÉo", "vidÊo", "vidêo", "vidÈo", "vidèo", "vidEo"] (in a varchar column declared with ut8_general_ci collation) :

  • 当要求MySQL根据此列对行进行排序时,排序是随机的(例如,MySQL不在é"和É"之间强制执行排序规则)
  • 当要求MySQL在此列上添加唯一键时,会因为它认为所有值都相等而引发错误.

我们可以用什么设置来修正这两点?

What setting can we fiddle with to fix these two points ?

PS:在相关说明中,我看不到utf8字符集的任何区分大小写的排序规则.我错过了什么吗?

PS : on a related note, I do not see any case sensitive collation for the utf8 charset. Did I miss something ?

[edit]我认为我最初的问题仍然有一些兴趣,我将保持原样(也许有一天会得到肯定的答案).

[edit] I think my initial question still holds some interest, and I will leave it as is (and maybe one day get a positive answer).

但是,事实证明,关于重音符号的字符串比较问题与我们的文本列排序规则无关.与MySQL通讯时,该链接与使用character_set_client参数的配置问题有关-默认为latin1.

It turned out, however, that our problems with string comparisons regarding accents was not linked to the collation of our text columns. It was linked to a configuration problem with the character_set_client parameter when talking with MySQL - which defaulted to latin1.

以下是向我们解释了所有内容并允许我们解决问题的文章:

Here is the article that explained it all to us, and allowed us to fix the problem :

退出MySQL字符集地狱

很长,但是请相信我,您需要这个长度来解释问题和解决方法.

It is lengthy, but trust me, you need this length to explain both the problem and the fix.

推荐答案

使用归类将这些字符区别开.也许是utf8_bin(区分大小写,因为它会对字符进行二进制比较)

Use collation that considers these characters to be distinct. Maybe utf8_bin (it's case sensitive, since it does binary comparison of characters)

http://dev.mysql.com /doc/refman/5.7/en/charset-unicode-sets.html

这篇关于MySQL中的UTF8字符串比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆