什么是最容易实现的支持模糊搜索的站点搜索应用程序? [英] Whats the easiest site search application to implement, that supports fuzzy searching?

查看:62
本文介绍了什么是最容易实现的支持模糊搜索的站点搜索应用程序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个网站,需要搜索大约20-30k记录,这些记录主要是电影和电视节目的名称.该站点运行带有memcache的php/mysql.

我正在寻找将FULLTEXT替换为我目前拥有的soundex()搜索,这种方法虽然有效,但在很多情况下效果不是很好.

有没有易于实施的体面搜索脚本,并且可以提供体面的搜索功能(表中3列).

解决方案

ewemli的答案是正确的,但是您应该结合使用FULLTEXT和soundex映射,而不是替换全文,否则LIKE查询可能很慢. /p>

create table with_soundex (
  id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
  original TEXT,
  soundex TEXT,
  FULLTEXT (soundex)
);

insert into with_soundex (original, soundex) values 

('add some test cases', CONCAT_WS(' ', soundex('add'), soundex('some'), soundex('test'), soundex('cases'))),
('this is some text', CONCAT_WS(' ', soundex('this'), soundex('is'), soundex('some'), soundex('text'))),
('one more test case', CONCAT_WS(' ', soundex('one'), soundex('more'), soundex('test'), soundex('case'))),
('just filling the index', CONCAT_WS(' ', soundex('just'), soundex('filling'), soundex('the'), soundex('index'))),
('need one more example', CONCAT_WS(' ', soundex('need'), soundex('one'), soundex('more'), soundex('example'))),
('seems to need more', CONCAT_WS(' ', soundex('seems'), soundex('to'), soundex('need'), soundex('more')))
('some helpful cases to consider', CONCAT_WS(' ', soundex('some'), soundex('helpful'), soundex('cases'), soundex('to'), soundex('consider')))

select * from with_soundex where match(soundex) against (soundex('test'));
+----+---------------------+---------------------+
| id | original            | soundex             |
+----+---------------------+---------------------+
|  1 | add some test cases | A300 S500 T230 C000 | 
|  2 | this is some text   | T200 I200 S500 T230 | 
|  3 | one more test case  | O500 M600 T230 C000 | 
+----+---------------------+---------------------+

select * from with_soundex where match(soundex) against (CONCAT_WS(' ', soundex('test'), soundex('some')));
+----+--------------------------------+---------------------------+
| id | original                       | soundex                   |
+----+--------------------------------+---------------------------+
|  1 | add some test cases            | A300 S500 T230 C000       | 
|  2 | this is some text              | T200 I200 S500 T230       | 
|  3 | one more test case             | O500 M600 T230 C000       | 
|  7 | some helpful cases to consider | S500 H414 C000 T000 C5236 | 
+----+--------------------------------+---------------------------+

在提供最大利用索引的优势(任何查询,如'%foo'都必须扫描表中的每一行)的同时,这会带来非常好的结果(在soundex算法的范围内).

请注意在每个单词而不是整个短语上运行soundex的重要性.您也可以在每个单词上运行自己的soundex版本,而不是使用SQL,但是在这种情况下,请确保在存储和检索时都执行该操作,以防算法之间存在差异(例如,MySQL的算法不限制本身符合标准 4个字符)

I have a site that needs to search thru about 20-30k records, which are mostly movie and TV show names. The site runs php/mysql with memcache.

Im looking to replace the FULLTEXT with soundex() searching that I currently have, which works... kind of, but isn't very good in many situations.

Are there any decent search scripts out there that are simple to implement, and will provide a decent searching capability (of 3 columns in a table).

解决方案

ewemli's answer is in the right direction but you should be combining FULLTEXT and soundex mapping, not replacing the fulltext, otherwise your LIKE queries are likely be very slow.

create table with_soundex (
  id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
  original TEXT,
  soundex TEXT,
  FULLTEXT (soundex)
);

insert into with_soundex (original, soundex) values 

('add some test cases', CONCAT_WS(' ', soundex('add'), soundex('some'), soundex('test'), soundex('cases'))),
('this is some text', CONCAT_WS(' ', soundex('this'), soundex('is'), soundex('some'), soundex('text'))),
('one more test case', CONCAT_WS(' ', soundex('one'), soundex('more'), soundex('test'), soundex('case'))),
('just filling the index', CONCAT_WS(' ', soundex('just'), soundex('filling'), soundex('the'), soundex('index'))),
('need one more example', CONCAT_WS(' ', soundex('need'), soundex('one'), soundex('more'), soundex('example'))),
('seems to need more', CONCAT_WS(' ', soundex('seems'), soundex('to'), soundex('need'), soundex('more')))
('some helpful cases to consider', CONCAT_WS(' ', soundex('some'), soundex('helpful'), soundex('cases'), soundex('to'), soundex('consider')))

select * from with_soundex where match(soundex) against (soundex('test'));
+----+---------------------+---------------------+
| id | original            | soundex             |
+----+---------------------+---------------------+
|  1 | add some test cases | A300 S500 T230 C000 | 
|  2 | this is some text   | T200 I200 S500 T230 | 
|  3 | one more test case  | O500 M600 T230 C000 | 
+----+---------------------+---------------------+

select * from with_soundex where match(soundex) against (CONCAT_WS(' ', soundex('test'), soundex('some')));
+----+--------------------------------+---------------------------+
| id | original                       | soundex                   |
+----+--------------------------------+---------------------------+
|  1 | add some test cases            | A300 S500 T230 C000       | 
|  2 | this is some text              | T200 I200 S500 T230       | 
|  3 | one more test case             | O500 M600 T230 C000       | 
|  7 | some helpful cases to consider | S500 H414 C000 T000 C5236 | 
+----+--------------------------------+---------------------------+

That gives pretty good results (within the limits of the soundex algo) while taking maximum advantage of an index (any query LIKE '%foo' has to scan every row in the table).

Note the importance of running soundex on each word, not on the entire phrase. You could also run your own version of soundex on each word rather than having SQL do it but in that case make sure you do it both when storing and retrieving in case there are differences between the algorithms (for instance, MySQL's algo doesn't limit itself to the standard 4 chars)

这篇关于什么是最容易实现的支持模糊搜索的站点搜索应用程序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆