如何使用 Lucene.Net 执行“包含"搜索而不是“开始于"搜索 [英] Howto perform a 'contains' search rather than 'starts with' using Lucene.Net
问题描述
我们使用 Lucene.NET 在客户网站上实现全文搜索.搜索本身已经可以工作,但我们现在想要进行修改.
We use Lucene.NET to implement a full text search on a clients website. The search itself works already but we now want to implement a modification.
目前所有术语都附加了一个 *
,这导致 Lucene 执行我将归类为 StartsWith
的搜索.
Currently all terms get appended a *
which leads Lucene to perform what I would classify as a StartsWith
search.
在未来,我们希望搜索能够执行类似于 Contains
而不是 StartsWith
的搜索.
In the future we would like to have a search that performs something like a Contains
rather than a StartsWith
.
我们使用
- Lucene.Net 2.9.2.2
- 标准分析器
- 默认查询解析器
示例:
(Title:Orch*)
匹配:Orchestra
但是:
(Title:rch*)
不匹配:Orchestra
我们希望第一个和第二个都匹配 Orchestra
.
We want the first and the second one to both match Orchestra
.
基本上我想要与这个问题中所问的完全相反的内容,我不确定为什么 Lucene 为这个人执行了 Contains
而不是 StartsWith
默认:
为什么这个 Lucene 查询是包含"的?而不是startsWith"?
Basically I want the exact opposite of what was asked in this question, I'm not sure why for this person Lucene performed a Contains
and rather than a StartsWith
by default:
Why is this Lucene query a "contains" instead of a "startsWith"?
我们怎样才能做到这一点?
我感觉这与分析器有关,但我不确定.
How can we make this happen?
I have the feeling it has something to do with the Analyzer but I'm not sure.
推荐答案
首先,我假设您正在使用 StandardAnalyzer 或类似的东西.您链接的问题无法理解您在搜索术语,他的案例 a*
将匹配Fleet Africa",因为它被标记为fleet"和africa".
First off, I assume you're using StandardAnalyzer, or something similar. Your linked question fail to understand that you search for terms, and his case a*
will match "Fleet Africa" because it's tokenized into "fleet" and "africa".
您需要调用 QueryParser.SetAllowLeadingWildcard(true)
才能编写类似 field:*value*
的查询.您是否真的在更改传递给 QueryParser 的字符串?
You need to call QueryParser.SetAllowLeadingWildcard(true)
to be able to write queries like field:*value*
. Are you actually changing the string that's passed to QueryParser?
您可以像往常一样解析查询,然后实现重写所有 TermQuery
QueryVisitor> 进入WildcardQuery
.这样你仍然支持词组搜索.
You could parse the query as usual, and then implement a QueryVisitor that rewrites all TermQuery
into WildcardQuery
. That way you still support phrase searches.
我认为将查询重写为前缀查询或通配符查询没有什么好处.兽人或宝箱和管弦乐队之间几乎没有共同之处,但这两个词都会匹配.相反,将您的客户与支持词干提取、同义词并提供拼写更正功能以修复简单搜索错误的分析器联系起来.
I see no good things in rewriting queries into prefix- or wildcard-queries. There is very little shared between an orc, or a chest, and an Orchestra, but both words will match. Instead, hook up your customer with an analyzer that supports stemming, synonyms, and provide a spell correction feature to fix simple searching mistakes.
这篇关于如何使用 Lucene.Net 执行“包含"搜索而不是“开始于"搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!