MySQL不将ı视为i? [英] MySQL does not treat ı as i?
问题描述
我在MySQL 5.7.27中有一个用户表,其中排序规则为utf8mb4_unicode_ci
.
I have a user table in MySQL 5.7.27 with utf8mb4_unicode_ci
collation.
不幸的是,例如ı没有像i那样被穿线,下面的查询找不到Yılmaz
Unfortunately, ı is not threaded as i for example, the below query won't find Yılmaz
select id from users where name='Yilmaz';
我对其他变音符号(如ä
和a
)没有问题.例如,两个查询给出的结果完全相同.
I do not have the problem with other umlautes like ä
and a
. For example the two queries give the exact same result.
select id from users where name='Märie';
select id from users where name='Marie';
我不能简单地将ı
替换为i
并进行搜索,因为那样的话,我将找不到名称为Yılmaz
的用户.
I cannot simply replace ı
by i
and to the search, because then I would not find users with the name Yılmaz
.
我必须使用不同的排序规则来支持所有的umlaute吗?
Do I have to use different collation to support all umlaute?
以下是有关unicode字母的更多信息:
Here are some more information about the unicode letters:
code | glyph |decimal | html | description
U+0131 | ı |305 |ı | Latin Small Letter dotless I
U+0069 | i |105 |- | Latin Small Letter I
推荐答案
引用 http://mysql .rjweb.org/utf8_collations.html ,我在三个排序规则中看到了ı=i
:utf8_general_ci,utf8_general_mysql500_ci,utf8_turkish_ci.但是,对于土耳其语排序规则,I=ı
排在其他带有重音符号的I之前.在所有其他归类中,ı
排在我的后面,好像它被当作一个单独的字母一样.
Referring to http://mysql.rjweb.org/utf8_collations.html , I see that ı=i
in 3 collations: utf8_general_ci, utf8_general_mysql500_ci, utf8_turkish_ci. However, for the turkish collation, I=ı
sorts before other accented I's. In all other collations ı
sorts after all I's, as if it is treated as a separate letter.
同时在所有归类中的İ=I
,除了utf8_turkish_ci.
Meanwhile İ=I
in all collations except utf8_turkish_ci.
该图在MySQL 8.0中变厚. utf8mb4_tr_0900_ai_ci(仅)具有以下顺序:
The plot thickens with MySQL 8.0. utf8mb4_tr_0900_ai_ci (only) has this ordering:
I=Ì=Í=Î=Ï=Ĩ=Ī=Ĭ=Į=ı sort before i=ì=í=î=ï=ĩ=ī=ĭ=į=İ
同时ä=Ä
,它们与大多数归类(包括土耳其语)的大多数其他带有重音符号的A相匹配.
Meanwhile ä=Ä
and they match most other accented A's for most collations (including the Turkish ones).
底线:似乎utf8 [mb4] _general_ci是5.7或8.0中唯一的排序规则,该排序规则始终将等于或等于常规i/I "的无点i(或点分I)视为同时忽略变音符号.
Bottom line: It seems that utf8[mb4]_general_ci is the only collation in 5.7 or 8.0 that will always treat a dotless-i (or dotted-I) equal to a 'regular i/I and at the same time ignore umlauts.
注意:一般"归类一次不能测试多个字符.也就是说,不限音变音符"加元音将不被视为等于组合.
Caveat: The "general" collations do not test more than one character at a time. That is, a "non-spacing umlaut" plus a vowel will not be treated as equal to the combination.
在该链接中...对于某些排序规则,一个字符æ
的排序与两个字母ae
相同.用Aa ae=æ az
表示.在大约其他排序规则的一半中,字符æ
被视为一个单独的字母;它由az
之后和b
之前表示.甚至在zz
之后进行斯堪的纳维亚归类.这种单独的字母概念有时适用于字母对,例如cs
(匈牙利)和ch
(传统西班牙语).
In that link... The one character æ
is sorted the same as the two letters ae
for some collations. That's indicated by: Aa ae=æ az
. In about half of the other collations, the character æ
is treated as a separate letter; this is indicated by it being after az
and before b
. Or even after zz
for Scandinavian collations. This separate letter concept sometimes applies to letter pairs, for example cs
(Hungarian) and ch
(traditional Spanish).
这篇关于MySQL不将ı视为i?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!