Oracle模糊文本搜索 [英] Oracle Fuzzy text search

查看:63
本文介绍了Oracle模糊文本搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在Oracle中进行模糊名称搜索?

How would one go about doing a fuzzy name search in Oracle?

例如:

我们的数据系统的首选邮件为:

Our data system has the preferred mailing as:

先生.尼古拉斯·吉姆·罗利(Nicolas Jim Raleigh)

但是在Facebook或其他搜索字段中,传递给该算法的名称是:

But in Facebook, or other search field, the name passed to the algorithm is:

尼克吉姆·罗利

该过程将针对所有首选名称运行搜索名称,然后返回包含最匹配字符的结果:

The process would run the search name against all of the preferred names, then return the result that contains the most matching characters:

先生.[ Nic ] olas Jim Raleigh

Mr. [Nic]olas Jim Raleigh

[ Nic ] k Jim Raleigh

在我搜索的名字的17个字符中,有16个出现在首选名字中,我们可以返回排名建议.

16 out of my searched name's 17 characters appear in the preferred name, and we could return a ranked suggestion.

[已编辑添加]

经过初步建议,并阅读了 Oracle的文本查询选项我在表上创建了索引

After initial suggestion, and reading of Oracle's Text Query options I have created an index on the table

create index ADD_EX_INDX3 on address_extract(pref_mail_name) 
  indextype is ctxsys.context 
  parameters ('DATASTORE CTXSYS.DEFAULT_DATASTORE');

现在可以成功退休

select score(1), ae.pref_mail_name
from address_extract ae
 where contains(pref_mail_name,'fuzzy(raleigh,,,weight)',1) > 0
order by score(1) desck

哪个返回

100 Mr. Raleigh H. Jameson
100 Mr. Nicolas Jim Raleigh
100 Ms. Susanne M. Raleigh
66  Mrs. LaReign Smith
66  Ms. Rahil Smith
62  Mr. Smith  Ragalie

但是,我正在努力进行全名搜索.我该怎么做全名?

I am struggling to to a full name search however. How would I go about doing the full name?

推荐答案

名称匹配是 hard .Oracle的文本索引支持模糊匹配和词干分析,这是一个开始,但是请考虑以下名称:

Name matching is hard. Oracle's Text indexing supports fuzzy matching and stemming, which is a start, but consider these names:

  • 尼古拉斯·罗利
  • Nihcolas Raleigh
  • Nico Raleigh
  • Nik Raleigh
  • Nicky Raleigh
  • 尼克·罗利
  • 尼古拉斯·罗利
  • 尼古拉·罗利
  • 妮基·罗利
  • 尼古拉·罗利
  • 尼古拉·罗利
  • Nikolaj Raleigh

尝试通过抽象来匹配它们,无论是Levenshtein距离还是Double Metaphone,都会产生假阳性和假阴性.这就是抽象的本质.获取主题集中且准确的结果集的最佳方法是使用同义词库(即使这样也不是完美的).不幸的是,组建一个全面的地名词典是一项巨大的工作.要了解任务,请查看 NameX网站上的统计信息.

Attempting to match those through abstractions, be it Levenshtein Distance or Double Metaphone, is going to generate false positives and false negatives. This is the nature of abstraction. The best way to get a focused and accurate result set is with a thesaurus (and even this isn't perfect). Unfortunately, assembling a comprehensive thesaurus of names is a gigantic undertaking; to get a sense of the task check out the stats on the NameX site.

更新:Oracle 11gR2包括为名称搜索量身定制的Oracle Text扩展.这是非常整洁的,并且绝对是第一个开始的地方.了解更多.

Update: Oracle 11gR2 includes an extension to Oracle Text tailored to name searching. This is highly neat, and definitely the first place to start. Find out more.

这篇关于Oracle模糊文本搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆