MySQL-将拉丁(英语)表单输入匹配到utf8(非英语)数据 [英] mySQL - matching latin (english) form input to utf8 (non-English) data

查看:80
本文介绍了MySQL-将拉丁(英语)表单输入匹配到utf8(非英语)数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在mySQL中维护音乐数据库,如何返回存储在例如人们搜索"Tiesto"时会显示Tiësto"吗?

I maintain a music database in mySQL, how do I return results stored under e.g. 'Tiësto' when people search for 'Tiesto'?

所有数据都存储在全文索引下,如果有区别的话.

All the data is stored under full text indexing, if that makes any difference.

我已经在PHP中使用Levenshtein和在SQL中使用REGEXP了-并不是要解决这个问题,而只是为了提高可搜索性.

I'm already employing a combination of Levenshtein in PHP and REGEXP in SQL - not in trying to solve this problem, but just for increased searchability in general.

PHP:

function Levenshtein($word) {

$words = array();
for ($i = 0; $i < strlen($word); $i++) {
    $words[] = substr($word, 0, $i) . '_' . substr($word, $i);
    $words[] = substr($word, 0, $i) . substr($word, $i + 1);
    $words[] = substr($word, 0, $i) . '_' . substr($word, $i + 1);
    }
$words[] = $word . '_';
return $words;
}

$fuzzyartist = Levenshtein($_POST['searchartist']);
$searchimplode = "'".implode("', '", $fuzzyartist)."'";

mySQL:

SELECT *
FROM new_track_database
WHERE artist REGEXP concat_ws('|', $searchimplode);

要添加的内容是,我经常在PHP中执行字符集转换和字符串卫生处理,但是这些一直是另一种方式-标准化非拉丁字符.我无法专心执行相反的过程,而只能在某些情况下根据存储的数据来进行.

To add, I frequently perform character set conversions and string sanitation in PHP, but these have always been the OTHER way - standardising non latin characters. I can't get my head around performing the oppsite process, but only in certain circumstances based on the data I've got stored.

推荐答案

可能的解决方案是在数据库中的艺术家"旁边创建另一列,例如"artist_normalized".在这里,在填充表时,您可以插入字符串的规范化"版本.然后可以针对artist_normalized列执行搜索.

A possible solution would be creating another column in the database next to "artist", like "artist_normalized". Here, while populating the table, you could insert a "normalized" version of the string. Search can then be performed against the artist_normalized column.

测试代码:

<?php
$transliterator = Transliterator::createFromRules(':: NFD; :: [:Nonspacing Mark:] Remove; :: NFC;', Transliterator::FORWARD);
$test = ['abcd', 'èe', '€', 'àòùìéëü', 'àòùìéëü', 'tiësto'];
foreach($test as $e) {
    $normalized = $transliterator->transliterate($e);
    echo $e. ' --> '.$normalized."\n";
}
?>

结果:

abcd --> abcd
èe --> ee
€ --> €
àòùìéëü --> aouieeu
àòùìéëü --> aouieeu
tiësto --> tiesto

魔术是由Transliterator类完成的.指定的规则执行以下三个操作:分解字符串,删除变音符号,然后重新组成规范化的字符串. PHP中的Transliterator是建立在ICU之上的,因此,您将依靠ICU库的表,这些表是完整且可靠的.

The magic is done by the Transliterator class. The specified rule performs three actions: decomposes the string, removes diacritics and then recomposes the string, canonicalized. Transliterator in PHP is built on top of ICU, so by doing this you're relying on the tables of the ICU library, which are complete and reliable.

注意:此解决方案需要PHP 5.4或更高版本以及 intl 扩展名.

Note: this solution requires PHP 5.4 or greater with the intl extension.

这篇关于MySQL-将拉丁(英语)表单输入匹配到utf8(非英语)数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆