如何使用 Lucene.Net 执行“包含"搜索而不是“开始于"搜索 [英] Howto perform a 'contains' search rather than 'starts with' using Lucene.Net

查看:25
本文介绍了如何使用 Lucene.Net 执行“包含"搜索而不是“开始于"搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们使用 Lucene.NET 在客户网站上实现全文搜索.搜索本身已经可以工作,但我们现在想要进行修改.

We use Lucene.NET to implement a full text search on a clients website. The search itself works already but we now want to implement a modification.

目前所有术语都附加了一个 *,这导致 Lucene 执行我将归类为 StartsWith 的搜索.

Currently all terms get appended a * which leads Lucene to perform what I would classify as a StartsWith search.

在未来,我们希望搜索能够执行类似于 Contains 而不是 StartsWith 的搜索.

In the future we would like to have a search that performs something like a Contains rather than a StartsWith.

我们使用

  • Lucene.Net 2.9.2.2
  • 标准分析器
  • 默认查询解析器

示例:

(Title:Orch*) 匹配:Orchestra

但是:

(Title:rch*) 不匹配:Orchestra

我们希望第一个和第二个都匹配 Orchestra.

We want the first and the second one to both match Orchestra.

基本上我想要与这个问题中所问的完全相反的内容,我不确定为什么 Lucene 为这个人执行了 Contains 而不是 StartsWith默认:
为什么这个 Lucene 查询是包含"的?而不是startsWith"?

Basically I want the exact opposite of what was asked in this question, I'm not sure why for this person Lucene performed a Contains and rather than a StartsWith by default:
Why is this Lucene query a "contains" instead of a "startsWith"?

我们怎样才能做到这一点?
我感觉这与分析器有关,但我不确定.

How can we make this happen?
I have the feeling it has something to do with the Analyzer but I'm not sure.

推荐答案

首先,我假设您正在使用 StandardAnalyzer 或类似的东西.您链接的问题无法理解您在搜索术语,他的案例 a* 将匹配Fleet Africa",因为它被标记为fleet"和africa".

First off, I assume you're using StandardAnalyzer, or something similar. Your linked question fail to understand that you search for terms, and his case a* will match "Fleet Africa" because it's tokenized into "fleet" and "africa".

您需要调用 QueryParser.SetAllowLeadingWildcard(true) 才能编写类似 field:*value* 的查询.您是否真的在更改传递给 QueryParser 的字符串?

You need to call QueryParser.SetAllowLeadingWildcard(true) to be able to write queries like field:*value*. Are you actually changing the string that's passed to QueryParser?

您可以像往常一样解析查询,然后实现重写所有 TermQueryQueryVisitor> 进入WildcardQuery.这样你仍然支持词组搜索.

You could parse the query as usual, and then implement a QueryVisitor that rewrites all TermQuery into WildcardQuery. That way you still support phrase searches.

我认为将查询重写为前缀查询或通配符查询没有什么好处.兽人或宝箱和管弦乐队之间几乎没有共同之处,但这两个词都会匹配.相反,将您的客户与支持词干提取、同义词并提供拼写更正功能以修复简单搜索错误的分析器联系起来.

I see no good things in rewriting queries into prefix- or wildcard-queries. There is very little shared between an orc, or a chest, and an Orchestra, but both words will match. Instead, hook up your customer with an analyzer that supports stemming, synonyms, and provide a spell correction feature to fix simple searching mistakes.

这篇关于如何使用 Lucene.Net 执行“包含"搜索而不是“开始于"搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆