找到拼写错误的搜索 [英] Locate misspelled searches

查看:75
本文介绍了找到拼写错误的搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在正确和拼写错误的单词之间建立关系:

参数 - 争论

环境 - 环境

foreign - foriegn

...
sql server中的
。我想避免使用LIKE命令。

How to create relationship between the correct and the misspelled words such:
argument – arguement
environment – enviroment
foreign – foriegn
...
in sql server. I want to avoid using LIKE command.

推荐答案

尝试使用SOUNDEX - http://msdn.microsoft.com/en-us/library/ms187384.aspx [ ^ ]



有趣的函数将字符串编码为四个字符的代码。下面的示例返回相同的结果:



SELECT SOUNDEX(''Smith''),SOUNDEX(''Smythe'');
Try using SOUNDEX - http://msdn.microsoft.com/en-us/library/ms187384.aspx[^]

It is interesting function that codes string to a four character code. The example below is returning the same results:

SELECT SOUNDEX (''Smith''), SOUNDEX (''Smythe'');


不那么简单。你可以拥有一个带有大词典的中间层(取决于数据库中实际有多少个不同的词)和一些密切拼写的标准,以便能够给出用户建议。请注意,此标准到目前为止不是通常的字符串比较,并且不能基于字母数字顺序。例如,环境和环境应该被认为是拼写关闭,但字母数字顺序会说,比如嫉妒,比环境更接近。你明白了吗?考虑到匹配字母的数量,以及这些字母的位置,您需要一些加权标准。



我怀疑有更多聪明的算法我不知道。你甚至可以打击专利侵权;请参阅,例如: http://www.google.com/patents/US6047300 [ ^ ]。



当你知道,现代搜索引擎使用类似的东西,但结果仍然可能看起来有问题,我担心他们会保密算法。



对不起,它确实没有提供完整的答案,但只是理解这个问题非常复杂可能会有所帮助。



-SA
Not so simple. You can have some intermediate tier with a big dictionary (depending on how many different words are actually in the database) and some criterion for "close spelling", to be able to give the user suggestions. Note that this criterion is by far not the usual string comparison and cannot be based on alphanumeric order. For example, "environment" and "nevironment" should be considered closed in spelling, but alphanumeric order would say that, say "envy", is much closer than "nevironment". Do you see the point. You will need some weighted criterion taking into account number of matching letters, and the positions of those letters, at some weight.

I suspect there are more clever algorithms I don''t know. You can even hit patent infringement; please see, for example: http://www.google.com/patents/US6047300[^].

As you know, modern search engines use something like that, but results still may look questionable, and I''m afraid they keep the algorithm in secret.

Sorry that it does not provide a complete answer, but just understanding that this matter is quite complicated could be helpful.

—SA


这篇关于找到拼写错误的搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆