Lucene索引和查询设计问题 - 搜索人 [英] Lucene Index and Query Design Question - Searching People

查看：141 发布时间：2018/8/2 15:59:20 indexing lucene lucene.net

本文介绍了Lucene索引和查询设计问题 - 搜索人的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我最近刚刚开始与Lucene（特别是Lucene.Net）合作并成功创建了几个指标，并且没有任何问题。之前曾与Endeca合作，我发现Lucene轻巧，功能强大，学习曲线要低得多（主要是因为简洁的API）。

I have recently just started working with Lucene (specifically, Lucene.Net) and have successfully created several indicies and have no problem with any of them. Previously having worked with Endeca, I find that Lucene is lightweight, powerful, and has a much lower learning curve (due mostly to a concise API).

但是，我有一个特定的索引/查询情况，我有问题包裹我的脑袋。我所拥有的是个人目录。可以在此应用程序中搜索人员，目的是返回精确匹配和近似匹配。现在，在索引中我将FirstName和LastName连接成一个名为FullName的字段，在两者之间添加一个空格。所以FirstName：Jon with LastName：Smith yield FullName：Jon Smith。我确实预见到中间名和可能后缀的可能性，但目前这并不重要。

However, I have one specific index/query situation which I am having problems wrapping my head around. What I have is a person directory. People can be searched for in this application, with the goal of returning both exact and approximate matches. Right now, in the index I concatenate the "FirstName" and "LastName" into a single field called "FullName", adding a space between the two. So FirstName:Jon with LastName:Smith yield FullName:Jon Smith. I do anticipate the possibility of middle names and possibly suffix, but that is not important at the moment.

我想在名称上进行模糊搜索，所以搜索约翰史密斯的人仍然会回来乔恩史密斯。我曾考虑过一个多元游戏，然而，如果他的名字实际上是Jon Del Carmen或Jon Paul Del Carmen，那么这就变得更加复杂。用户输入的内容中没有任何内容可以描述名字或姓氏。

I would like to do the equivalent of a fuzzy search on the name, so someone searching for "John Smith" would still get back "Jon Smith". I had thought about a multisearch, however, this becomes more involved if his name was actually "Jon Del Carmen" or "Jon Paul Del Carmen". I have nothing in what the user types in to delineate the first name or last name pieces.

我唯一想到的是我可以替换连接值中的空格具有不会被丢弃的角色。如果我在为索引构建文档时执行此操作，并且在解析查询时，我可以将其视为一个更大的单词，对吧？还有另一种方法可以用于简单名称（Jon Smith）和更复杂的名称（Jon Paul Del Carmen）吗？

The only thought that I have is that I could replace spaces in the concatenated value with a character that would not be discarded. If I did this when I built the document for the index and also when I parsed the query, I could treat it as one larger word, right? Is there another way to do this that would work for both simple names ("Jon Smith") and also more complex names ("Jon Paul Del Carmen")?

任何建议真的很值得赞赏。提前致谢！

Any advice would truly be appreciated. Thanks in advance!

编辑：其他详细信息如下。

在Luke，我输入以下查询：

In Luke, I put in the following query:

FullName:jonn smith~

它被解析为：

FullName:jonn CreatedOn:smith~0.5

解释：

BooleanQuery:boost=1.0000
    clauses=2, maxClauses=1024
    Clause 0: SHOULD
        TermQuery:boost=1.0000
            Term: field='FullName' text='jonn'
    Cluase 1: SHOULD
        FuzzyQuery: boost=1.0000
            prefixLen=0, minSimilarity=0.5000
            org.apache.lucene.search.FuzzyTermEnum: diff=-1.0000
            FilteredTermEnum: Exception null

CreatedOn是另一个字段指数。我尝试在jonn smith这个术语周围加上引号，但后来却将其视为一个短语查询。我确信问题在于我只是做得不对，但在这一切都是如此绿色，我不确定那是什么。

"CreatedOn" is another Field in the index. I tried putting quotes around the term "jonn smith", but it then treats it like a phrasequery, instead. I am sure that the problem is that I am just not doing something right, but being so green at all of this, I am not sure what that something truly is.

Lucene索引和查询设计问题 - 搜索人 [英] Lucene Index and Query Design Question - Searching People

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Lucene索引和查询设计问题 - 搜索人 [英] Lucene Index and Query Design Question - Searching People

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭