即使两个字符不同,我该如何匹配两个字符串? [英] how can i match two strings even if they are 1 character different?

查看:141
本文介绍了即使两个字符不同,我该如何匹配两个字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个庞大的句子数据库,还有一个问题,例如我很好"这样的句子与我不好"不匹配,反之亦然,或者那是我的吗?"当我希望将它们检测为匹配项时,不匹配是我的",反之亦然.

I have a large database of sentences, and a problem where sentences like "i'm good" do not match to "im good" and vise versa or "is that mine?" not matching with "is that mine" and vise versa when i would want them to be detected as a match.

我使用通配符并进行了研究,做了一些复杂且混乱的函数,但这只是一个很大的混乱.并且确定必须有一种以1个字符的lee方式进行搜索的方法.如果可以的话,我想控制哪些字符具有这种回风,例如在我的示例中,主要的问题起因是问号和半引号. (?').

I had made complicated and messy functions trying to do this with wildcards and researching but its just a big mess. and im sure there must be a way to search with this 1 character lee way. If i can i would like to control which characters get this lee way, like in my examples the main problem causers are the question mark and the half quote. (? ').

im当前使用带有php和mysql的平面选择查询来进行匹配查询.

im currently using a plane select query with php and mysql to do the matching queries.

我很乐意为您解决这个问题,因此我可以清除目前执行工作不一致的大量代码.

i would love some help to figure this out so i can clean up the big mess of code that is currently doing the job inconsistently.

如果有人希望看到查询查询的代码查询是这样的:

in case anyone wants to see the code query checking for matches is like this:

$checkqwry = "select * from `eng-jap` where (eng = '$eng' or english = '$oldeng' or english = '$oldeng2') and (jap = '$jap' or japanese = '$oldjap' or japanese = '$oldjap2');";

查询的目的是仅检查数据库中是否已经存在带有$ eng和$ jap的翻译.您看到$ oldeng $ oldeng2和$ oldeng3等的原因就像我说的那样,即使有或没有问号,我的混乱的愚蠢尝试也会匹配.其中$ oldeng变量中的一些变量带有问号或半引号,依此类推,而其他变量则没有.在附加和删除问号之类的内容上还有更多代码.是的,这真是一团糟.

the purpose of the query is to just check if there is already a translation with the $eng and $jap already in the DB. the reason you see $oldeng $oldeng2 and $oldeng3 and so on is like i said, my messy foolish attempts to match even if there is or is not a question mark and so on. where some of the $oldeng variables have questions marks or halfquotes and so on and the others dont. there is more code above appending and remove question marks and stuff. yes its a big mess.

推荐答案

您要使用如上所述的String Metric算法,PHP在

You want to use a String Metric algorithm as mentioned above, PHP has this function built in http://php.net/manual/en/function.levenshtein.php as well as http://www.php.net/manual/en/function.similar-text.php.

MySQL本身并未实现此(特定算法),但有些人继续进行并编写了存储过程来实现相同的目的:

MySQL doesn't implement this (specific algorithm) natively but some people have went ahead and wrote stored procedures to accomplish the same: http://www.artfulsoftware.com/infotree/queries.php#552

在我看来,使用可以处理任意更改的String Metric比删除标点符号更好,并且还可以捕获遗漏,换位等...

In my opinion using a String Metric that can handle arbitrary changes is better then stripping out punctuation, and can also catch omissions, transpositions, etc...

这篇关于即使两个字符不同,我该如何匹配两个字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆