如何使用Lucene进行个人名称(名字,姓氏)搜索? [英] How can I use Lucene for personal name (first name, last name) search?

查看:94
本文介绍了如何使用Lucene进行个人名称(名字,姓氏)搜索?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在为NFL球员数据库编写搜索功能.

I'm writing a search feature for a database of NFL players.

用户输入搜索字符串,例如"Jason Campbell" "Campbell" "Jason" .

The user enters a search string like "Jason Campbell" or "Campbell" or "Jason".

我无法获得适当的结果.

I'm having trouble getting the appropriate results.

索引时应使用哪个Analyzer?查询时哪个Query?我应该区分姓和名还是只对全名字符串编制索引?

Which Analyzer should I use when indexing? Which Query when querying? Should I distinguish between first name and last name or just index the full name string?

我想要以下行为:

查询:杰森·坎贝尔" -> 结果:一位球员杰森·坎贝尔的完全匹配

Query: "Jason Campbell" -> Result: exact match for 1 player, Jason Campbell

查询:坎贝尔" -> 结果:所有以坎贝尔命名的玩家

Query: "Campbell" -> Result: all players with Campbell in their name

查询:杰森" -> 结果:所有以杰森为名的球员

Query: "Jason" -> Result: all players with Jason in their name

查询:坎贝尔" [拼写错误]-> 结果:所有以坎贝尔命名的玩家

Query: "Cambel" [misspelled] -> Result: all players with Campbell in their name

推荐答案

StandardAnalyzer对于上述所有查询应该都可以正常工作.您的第一个查询应使用双引号括起来以进行完全匹配,而最后一个查询则需要模糊查询.例如,您可以将Cambell设置为0.5,然后将Campbell设置为match(在波浪号后的数字表示模糊性).

StandardAnalyzer should work fine for all above queries. Your first query should be enclosed in double-quotes for an exact match, your last query would require a fuzzy query. For example you could set Cambell~0.5 and you could get Campbell as match(with the numeric value after the tilde indicating the fuzziness).

顺便说一句,我建议使用Solr,它提供了拼写检查和自动建议功能,因此您不必重新发明轮子.这类似于Google的您是不是要..."

BTW I would suggest using Solr which provides features for spell-check and auto-suggest so you wouldn't have to reinvent the wheel. This is similar to Google's "did you mean..."

这篇关于如何使用Lucene进行个人名称(名字,姓氏)搜索?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆