MySQL变音符号不敏感搜索(阿拉伯语) [英] MySQL diacritic insensitive search (Arabic)

查看:161
本文介绍了MySQL变音符号不敏感搜索(阿拉伯语)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很难用阿拉伯文字进行变音符号不敏感搜索.

I have trouble making a diacritic insensitive search with arabic text.

我已经为该表测试了多种设置:utf8和utf16中的编码以及utf8_general_ci,utf16_general_ci和utf16_unicode_ci中的排序规则.

I have tested multiple setups for the table in question: encodings in utf8 and utf16 as well as collations in utf8_general_ci, utf16_general_ci and utf16_unicode_ci.

该搜索适用于åä特殊字符.即:

The search works for åä special characters. I.e:

select * from test where text like '%a%'

将返回文本为a,å或ä的列.但这不适用于阿拉伯语变音符号.也就是说,如果文字是بِسْمِ,而我搜寻بسم,则不会有任何点击.

Would return columns where text is a, å or ä. But it won't work with the Arabic diacritics. I.e if the text is بِسْمِ and I search for بسم, I don't get any hits.

有什么想法可以通过这个吗?

Any ideas how to get pass this?

真正的用法稍后将是PHP(一种搜索功能),但是在将其移植到PHP之前,我直接在MySQL数据库中进行测试.

The real usage will later be PHP (a search function), but I'm working directly in the MySQL db just for testing before I port it over to PHP.

(来自评论)

CREATE TABLE test (
    ↵ id int(11) unsigned NOT NULL AUTO_INCREMENT,
    ↵ text text COLLATE utf8_unicode_ci,
    ↵ PRIMARY KEY (id)↵
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci 

推荐答案

(这不是答案",而是解决方案".)

(This is not an "answer", but a "resolution".)

LIKE似乎不适用于您的阿拉伯字符串.我不知道还有多少失败.我建议您在 http://bugs.mysql.com 上编写错误报告.这是一个测试案例,表明LIKE '...'LIKE '%...%'均未找到两个字符串,而'='有效:

It seems that LIKE does not work with your Arabic string. I don't know how much more it fails on. I recommend you write a bug report at http://bugs.mysql.com . Here is a test case that shows that neither LIKE '...' nor LIKE '%...%' finds both strings, whereas '=' works:

CREATE  TABLE so28863402 (
    id int(11) unsigned NOT NULL AUTO_INCREMENT,
    txt text COLLATE utf8_unicode_ci,   -- deliberate choice of COLLATION
    PRIMARY KEY (id)
) ENGINE=InnoDB
        DEFAULT CHARSET=utf8;
INSERT INTO so28863402 (txt) VALUES
    (UNHEX('D8A8D990D8B3D992D985D990')),  -- Using hex to avoid any copy/paste issues
    (UNHEX('D8A8D8B3D985'));  -- The values should compare equal
SELECT id, txt, HEX(txt) FROM so28863402;
SELECT txt, COUNT(*) FROM so28863402 GROUP BY txt; -- GROUP BY finds them equal.
SELECT * from so28863402
    WHERE txt = 'بسم';   -- Finds both rows (correct)
SELECT * from so28863402
    WHERE txt LIKE '%بسم%';  -- Finds one row (incorrect)
-- Further checks:
SELECT * FROM so28863402 WHERE txt  =   UNHEX(  'D8A8D8B3D985'  );
SELECT * FROM so28863402 WHERE txt LIKE UNHEX(  'D8A8D8B3D985'  );
SELECT * FROM so28863402 WHERE txt LIKE UNHEX('25D8A8D8B3D98525'); -- x25 is '%'

这篇关于MySQL变音符号不敏感搜索(阿拉伯语)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆