PHP/MySQL小型模糊搜索 [英] PHP/MySQL small-scale fuzzy search

查看:58
本文介绍了PHP/MySQL小型模糊搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找对小型PHP/MySQL应用程序实施模糊搜索的方法.具体来说,我有一个大约有2400条记录的数据库(每年以大约600条记录的速度添加记录,因此它是一个 small 数据库).感兴趣的三个字段是街道地址,姓氏和日期.我希望能够通过这些字段之一进行搜索,并且本质上具有对拼写/字符错误的容忍度.即,"123 Main Street"的地址也应与"123 Main St","123 Main St.","123 Mian St","123 Man St","132 Main St"等匹配,并以相同的名称命名和日期.

I'm looking to implement fuzzy search for a small PHP/MySQL application. Specifically, I have a database with about 2400 records (records added at a rate of about 600 per year, so it's a small database). The three fields of interest are street address, last name and date. I want to be able to search by one of those fields, and essentially have tolerance for spelling/character errors. i.e., an address of "123 Main Street" should also match "123 Main St", "123 Main St.", "123 Mian St", "123 Man St", "132 Main St", etc. and likewise for name and date.

我对其他类似问题的回答遇到的主要问题:

The main issues I have with answers to other similar questions:

  • 不可能为每种可能的错误拼写定义同义词,而忘记为日期和名称这样做.
  • 对于如此有限的搜索数据集(称为最大记录数5,000,每个记录3个字段)来说,Lucene等似乎非常重要.
  • 仅对通配符进行处理似乎并不符合所有可能的拼写错误.

有什么建议吗?我知道不可能用MySQL进行本机处理,但是由于数据集非常有限,我想保持相对简单...也许一个PHP类可以全部获得 数据库中的记录,使用某种比较算法,并返回相似记录的ID?

Any suggestions? I know it isn't going to be possible to do natively with MySQL, but since the data set is so limited, I'd like to keep it relatively simple... perhaps a PHP class that gets all of the records from the DB, uses some sort of comparison algorithm, and returns the IDs of the similar records?

谢谢, 杰森

推荐答案

Razzie的答案(或使用

Razzie's answer (or using Damerau–Levenshtein) ranks a list of candidates matches according to their closeness to the search key. (Take care: if the key is "12 Main St" then "13 Main St" has the same typing distance as "12 Moin St" but you might want to rank it low or even exclude it, as with 11 and 22 Main St etc.)

但是您如何选择要控制规模的候选人列表呢?

But how do you select a list of candidates of a manageable size to rank?

一种方法是为您要搜索的字符串中的每个单词计算变音符值(或使用双变音符的值).使用包含原始字符串的行的ID将每个这些变音器保存在另一个表中.然后,您可以使用LIKE'key%'快速搜索这些子音素值,其中key是来自搜索文本的单词的子音素.

One way is to compute the metaphone value (or values, using double-metaphone) for each word in the strings your going to search. Save each of these metaphones in another table with the id of the row containing the original string. You can then search these metaphone values quickly with LIKE 'key%' where key is the metaphone of a word from the search text.

此线程上查看建议的答案. .它非常整洁,对于不大的数据库应该可以很好地工作.

Check out the suggested answer on this thread. It's quite neat and should work nicely for DBs that aren't huge.

这篇关于PHP/MySQL小型模糊搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆