MySQL REGEXP查询-重音不敏感搜索 [英] MySQL REGEXP query - accent insensitive search

查看:117
本文介绍了MySQL REGEXP查询-重音不敏感搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要查询葡萄酒名称的数据库,其中许多葡萄酒都包含重音符号(但不是统一的,因此可以输入有或没有重音符号的类似葡萄酒)

I'm looking to query a database of wine names, many of which contain accents (but not in a uniform way, and so similar wines may be entered with or without accents)

基本查询如下:

SELECT * FROM `table` WHERE `wine_name` REGEXP '[[:<:]]Faugères[[:>:]]'

将返回标题中带有Faugères"的条目,但不返回"Faugeres"

which will return entries with 'Faugères' in the title, but not 'Faugeres'

SELECT * FROM `table` WHERE `wine_name` REGEXP '[[:<:]]Faugeres[[:>:]]'

相反.

我曾想过类似的事情:

SELECT * 
FROM `table` 
WHERE `wine_name` REGEXP '[[:<:]]Faug[eèêéë]r[eèêéë]s[[:>:]]'

可以解决问题,但这只会返回没有重音符号的结果.

might do the trick, but this only returns the results without the accents.

该字段整理为utf8_unicode_ci,根据我的了解,该字段应为utf8_unicode_ci.

The field is collated as utf8_unicode_ci, which from what I've read is how it should be.

有什么建议吗?!

推荐答案

您不走运:

警告

REGEXP和RLIKE运算符以字节方式工作,因此它们是 不是多字节安全的,使用多字节可能会产生意外的结果 字符集.此外,这些运算符还通过 它们的字节值和重音字符的比较结果可能不相等 即使给定的归类将它们视为相等.

The REGEXP and RLIKE operators work in byte-wise fashion, so they are not multi-byte safe and may produce unexpected results with multi-byte character sets. In addition, these operators compare characters by their byte values and accented characters may not compare as equal even if a given collation treats them as equal.

[[:<:]][[:>:]]正则表达式运算符是单词边界的标记.使用LIKE运算符可以达到的最接近的结果是在这行上:

The [[:<:]] and [[:>:]] regexp operators are markers for word boundaries. The closest you can achieve with the LIKE operator is something on this line:

SELECT *
FROM `table`
WHERE wine_name = 'Faugères'
   OR wine_name LIKE 'Faugères %'
   OR wine_name LIKE '% Faugères'

如您所见,它并不完全等效,因为我将单词边界的概念限制为空格.为其他边界添加更多子句将是一团糟.

As you can see it's not fully equivalent because I've restricted the concept of word boundary to spaces. Adding more clauses for other boundaries would be a mess.

您也可以使用全文本搜索(尽管不一样),但是您无法在InnoDB表中定义全文本索引(至今).

You could also use full text searches (although it isn't the same) but you can't define full text indexes in InnoDB tables (yet).

您当然不走运:)

附录:此已更改从MySQL 8.0开始:

Addendum: this has changed as of MySQL 8.0:

MySQL使用Unicode国际组件(ICU)实现了正则表达式支持,该组件提供了完整的Unicode支持并且是多字节安全的. (在MySQL 8.0.4之前,MySQL使用Henry Spencer的正则表达式实现,该实现以字节方式运行,并且不是多字节安全的.

MySQL implements regular expression support using International Components for Unicode (ICU), which provides full Unicode support and is multibyte safe. (Prior to MySQL 8.0.4, MySQL used Henry Spencer's implementation of regular expressions, which operates in byte-wise fashion and is not multibyte safe.

这篇关于MySQL REGEXP查询-重音不敏感搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆