HOWTO执行“包含”搜索,而不是使用Lucene.Net“开头” [英] Howto perform a 'contains' search rather than 'starts with' using Lucene.Net
问题描述
我们使用Lucene.NET来实现对客户的网站上全文搜索。搜索本身的工作已经但是我们现在要实现的修改。
We use Lucene.NET to implement a full text search on a clients website. The search itself works already but we now want to implement a modification.
目前所有条款得到追加一个 *
这线索Lucene来执行什么,我会为一个 StartsWith
搜索分类。
Currently all terms get appended a *
which leads Lucene to perform what I would classify as a StartsWith
search.
在未来,我们希望有一个搜索执行类似一个包含
而非 StartsWith
。
In the future we would like to have a search that performs something like a Contains
rather than a StartsWith
.
我们用
- Lucene.Net 2.9.2.2
- StandardAnalyzer
- 默认的QueryParser
样品:
(标题:管弦乐*)
匹配:乐团
但是:
(标题:RCH *)
不匹配:乐团
我们希望第一个和第二个既匹配乐团
。
We want the first and the second one to both match Orchestra
.
基本上我想要的是什么在问这个问题正好相反,我不知道为什么这个人Lucene的执行包含
键,而不是 StartsWith
默认:结果
的这是为什么Lucene的查询"包含"而不是" startsWith"?
Basically I want the exact opposite of what was asked in this question, I'm not sure why for this person Lucene performed a Contains
and rather than a StartsWith
by default:
Why is this Lucene query a "contains" instead of a "startsWith"?
我们怎样才能做到这一点结果
我有一种感觉它有事可做用分析仪,但我不知道。
How can we make this happen?
I have the feeling it has something to do with the Analyzer but I'm not sure.
推荐答案
首先,我假设你正在使用StandardAnalyzer,或类似的东西。您链接的问题不明白,你搜索的条款,他的案件 A *
,因为它表征为舰队和非洲将匹配舰队非洲。
First off, I assume you're using StandardAnalyzer, or something similar. Your linked question fail to understand that you search for terms, and his case a*
will match "Fleet Africa" because it's tokenized into "fleet" and "africa".
您需要调用 QueryParser.SetAllowLeadingWildcard(真)
要能写这样字段*值*
。 ?你是真正改变的是传递给QueryParser的字符串
You need to call QueryParser.SetAllowLeadingWildcard(true)
to be able to write queries like field:*value*
. Are you actually changing the string that's passed to QueryParser?
您可以分析查询像往常一样,然后实施的 QueryVisitor 的重写所有 TermQuery
到 WildcardQuery
。这样,你还支持词组搜索。
You could parse the query as usual, and then implement a QueryVisitor that rewrites all TermQuery
into WildcardQuery
. That way you still support phrase searches.
我看到重写查询转换为前缀或通配符查询没有好东西。有一个兽人,或胸部,和乐团之间很少共享,但是这两个词将匹配。相反,挂钩与支持词干,同义词的分析你的客户,并提供拼写校正功能来解决简单的搜索错误。
I see no good things in rewriting queries into prefix- or wildcard-queries. There is very little shared between an orc, or a chest, and an Orchestra, but both words will match. Instead, hook up your customer with an analyzer that supports stemming, synonyms, and provide a spell correction feature to fix simple searching mistakes.
这篇关于HOWTO执行“包含”搜索,而不是使用Lucene.Net“开头”的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!