MySQL不将ı视为i? [英] MySQL does not treat ı as i?

查看:118
本文介绍了MySQL不将ı视为i?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在MySQL 5.7.27中有一个用户表,其中排序规则为utf8mb4_unicode_ci.

I have a user table in MySQL 5.7.27 with utf8mb4_unicode_ci collation.

不幸的是,例如ı没有像i那样被穿线,下面的查询找不到Yılmaz

Unfortunately, ı is not threaded as i for example, the below query won't find Yılmaz

select id from users where name='Yilmaz';

我对其他变音符号(如äa)没有问题.例如,两个查询给出的结果完全相同.

I do not have the problem with other umlautes like ä and a. For example the two queries give the exact same result.

select id from users where name='Märie';

select id from users where name='Marie';

我不能简单地将ı替换为i并进行搜索,因为那样的话,我将找不到名称为Yılmaz的用户.

I cannot simply replace ı by i and to the search, because then I would not find users with the name Yılmaz.

我必须使用不同的排序规则来支持所有的umlaute吗?

Do I have to use different collation to support all umlaute?

以下是有关unicode字母的更多信息:

Here are some more information about the unicode letters:

code    | glyph |decimal |  html   | description
U+0131  |  ı    |305     |ı |  Latin Small Letter dotless I
U+0069  |  i    |105     |-        |  Latin Small Letter I

推荐答案

引用 http://mysql .rjweb.org/utf8_collat​​ions.html ,我在三个排序规则中看到了ı=i:utf8_general_ci,utf8_general_mysql500_ci,utf8_turkish_ci.但是,对于土耳其语排序规则,I=ı排在其他带有重音符号的I之前.在所有其他归类中,ı排在我的后面,好像它被当作一个单独的字母一样.

Referring to http://mysql.rjweb.org/utf8_collations.html , I see that ı=i in 3 collations: utf8_general_ci, utf8_general_mysql500_ci, utf8_turkish_ci. However, for the turkish collation, I=ı sorts before other accented I's. In all other collations ı sorts after all I's, as if it is treated as a separate letter.

同时在所有归类中的İ=I,除了utf8_turkish_ci.

Meanwhile İ=I in all collations except utf8_turkish_ci.

该图在MySQL 8.0中变厚. utf8mb4_tr_0900_ai_ci(仅)具有以下顺序:

The plot thickens with MySQL 8.0. utf8mb4_tr_0900_ai_ci (only) has this ordering:

I=Ì=Í=Î=Ï=Ĩ=Ī=Ĭ=Į=ı sort before  i=ì=í=î=ï=ĩ=ī=ĭ=į=İ

同时ä=Ä,它们与大多数归类(包括土耳其语)的大多数其他带有重音符号的A相匹配.

Meanwhile ä=Ä and they match most other accented A's for most collations (including the Turkish ones).

底线:似乎utf8 [mb4] _general_ci是5.7或8.0中唯一的排序规则,该排序规则始终将等于或等于常规i/I "的无点i(或点分I)视为同时忽略变音符号.

Bottom line: It seems that utf8[mb4]_general_ci is the only collation in 5.7 or 8.0 that will always treat a dotless-i (or dotted-I) equal to a 'regular i/I and at the same time ignore umlauts.

注意:一般"归类一次不能测试多个字符.也就是说,不限音变音符"加元音将不被视为等于组合.

Caveat: The "general" collations do not test more than one character at a time. That is, a "non-spacing umlaut" plus a vowel will not be treated as equal to the combination.

在该链接中...对于某些排序规则,一个字符æ的排序与两个字母ae相同.用Aa ae=æ az表示.在大约其他排序规则的一半中,字符æ被视为一个单独的字母;它由az之后和b之前表示.甚至在zz之后进行斯堪的纳维亚归类.这种单独的字母概念有时适用于字母对,例如cs(匈牙利)和ch(传统西班牙语).

In that link... The one character æ is sorted the same as the two letters ae for some collations. That's indicated by: Aa ae=æ az. In about half of the other collations, the character æ is treated as a separate letter; this is indicated by it being after az and before b. Or even after zz for Scandinavian collations. This separate letter concept sometimes applies to letter pairs, for example cs (Hungarian) and ch (traditional Spanish).

这篇关于MySQL不将ı视为i?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆