当str包含'é'或'ë'并且仅substr'e'时,INSTR(str,substr)不起作用 [英] INSTR(str,substr) does not work when str contains 'é' or 'ë' and substr only 'e'

查看:135
本文介绍了当str包含'é'或'ë'并且仅substr'e'时,INSTR(str,substr)不起作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在有关stackoverflow的另一篇文章中,我读到INSTR可用于根据相关性对结果进行排序.

In another post on stackoverflow, I read that INSTR could be used to order results by relevance.

我对col LIKE '%str%' and INSTR(col,'str')`的理解是,它们的行为都相同.排序规则的处理方式似乎有所不同.

My understanding of col LIKE '%str%' andINSTR(col, 'str')` is that they both behave the same. There seems to be a difference in how collations are handled.

CREATE TABLE `users` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

INSERT INTO users (name)
VALUES ('Joël'), ('René');

SELECT * FROM users WHERE name LIKE '%joel%'; -- 1 record returned
SELECT * FROM users WHERE name LIKE '%rene%'; -- 1 record returned
SELECT * FROM users WHERE INSTR(name, 'joel') > 0; -- 0 records returned
SELECT * FROM users WHERE INSTR(name, 'rene') > 0; -- 0 records returned
SELECT * FROM users WHERE INSTR(name, 'joël') > 0; -- 1 record returned
SELECT * FROM users WHERE INSTR(name, 'rené') > 0; -- 1 record returned

尽管INSTR进行了一些转换,但它在é中找到了ë.

Although INSTR does some conversion, it finds ë in é.

SELECT INSTR('é', 'ë'), INSTR('é', 'e'), INSTR('e', 'ë');
-- returns 1, 0, 0

我错过了什么吗?

http://sqlfiddle.com/#!2/9bf21/6 (使用mysql-version:5.5.22)

http://sqlfiddle.com/#!2/9bf21/6 (using mysql-version: 5.5.22)

推荐答案

这是由于已验证的LOCATE()INSTR() 上的错误70767.

This is due to bug 70767 on LOCATE() and INSTR(), which has been verified.

尽管 INSTR() 文档指出它可以用于多字节字符串,就像您注意到的那样,它不适用于像utf8_general_ci这样的归类,而utf8_general_ci这样的归类

Though the INSTR() documentation states that it can be used for multi-byte strings, it doesn't seem to work, as you note, with collations like utf8_general_ci, which should be case and accent insensitive

此函数是多字节安全的,并且仅当至少一个参数是二进制字符串时才区分大小写.

This function is multi-byte safe, and is case sensitive only if at least one argument is a binary string.

错误报告指出,尽管MySQL正确地做到了这一点,但只有在 bytes 的数目也相同时,才会这样做:

The bug report states that although MySQL does this correctly it only does so when the number of bytes is also identical:

但是,当您在另一根弦中寻找一根弦时,您可以轻松地观察到它们不(完全)尊重排序规则.似乎正在发生的事情是MySQL寻找一个排序规则等于目标长度与目标长度完全相同的子字符串.这很少是真的.

However, you can easily observe that they do not (completely) respect collations when looking for one string inside another one. It seems that what's happening is that MySQL looks for a substring which is collation-equal to the target which has exactly the same length in bytes as the target. This is only rarely true.

如果要创建下表,要弄乱报告示例:

To pervert the reports example, if you create the following table:

create table t ( needle varchar(10), haystack varchar(10)
                  ) COLLATE=utf8_general_ci;
insert into t values ("A", "a"), ("A", "XaX");
insert into t values ("A", "á"), ("A", "XáX");
insert into t values ("Á", "a"), ("Á", "XaX");
insert into t values ("Å", "á"), ("Å", "XáX");

然后运行此查询,您可以看到演示的相同行为:

then run this query, you can see the same behaviour demonstrated:

select needle
     , haystack
     , needle=haystack as `=`
     , haystack LIKE CONCAT('%',needle,'%') as `like`
     , instr(needle, haystack) as `instr`
  from t;

SQL提琴

这篇关于当str包含'é'或'ë'并且仅substr'e'时,INSTR(str,substr)不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆