模糊文本搜索:正则表达式通配符搜索生成器? [英] Fuzzy Text Search: Regex Wildcard Search Generator?

查看:199
本文介绍了模糊文本搜索:正则表达式通配符搜索生成器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道是否有某种方法可以在PHP中进行模糊字符串匹配.寻找长字符串中的单词,即使拼写错误,也要找到可能的匹配项;如果由于OCR错误而被一个字符关闭,就会找到它.

I'm wondering if there is some kind of way to do fuzzy string matching in PHP. Looking for a word in a long string, finding a potential match even if its mis-spelled; something that would find it if it was off by one character due to an OCR error.

我当时想正则表达式生成器可能能够做到这一点.因此,如果输入"crazy",它将生成此正则表达式:

I was thinking a regex generator might be able to do it. So given an input of "crazy" it would generate this regex:

.*((crazy)|(.+razy)|(c.+azy)|cr.+zy)|(cra.+y)|(craz.+)).*

然后它将返回该单词的所有匹配项或该单词的变体形式.

It would then return all matches for that word or variations of that word.

如何构建生成器: 我可能会把搜索字符串/单词分成一个字符数组,然后构建一个regex表达式,对新创建的数组进行foreach替换,将键值(字符串中字母的位置)替换为.+".

How to build the generator: I would probably split the search string/word up into an array of characters and build the regex expression doing a foreach the newly created array replacing the key value (the position of the letter in the string) with ".+".

这是进行模糊文本搜索的好方法还是有更好的方法?什么样的字符串比较会根据其接近程度给我一个分数呢?我正在尝试查看一些转换较差的OCR文本是否包含一个简短的单词.

Is this a good way to do fuzzy text search or is there a better way? What about some kind of string comparison that gives me a score based on how close it is? I'm trying to see if some badly converted OCR text contains a word in short.

推荐答案

当您不知道正确的单词是什么时,字符串距离函数是无用的.我建议使用pspell函数:

String distance functions are useless when you don't know what the right word is. I'd suggest pspell functions:

$p = pspell_new("en");
print_r(pspell_suggest($p, "crazzy"));

http://www.php.net/manual/zh/function.pspell-suggest.php

这篇关于模糊文本搜索:正则表达式通配符搜索生成器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆