HOWTO执行“包含”搜索,而不是使用Lucene.Net“开头” [英] Howto perform a 'contains' search rather than 'starts with' using Lucene.Net

查看:214
本文介绍了HOWTO执行“包含”搜索,而不是使用Lucene.Net“开头”的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们使用Lucene.NET来实现对客户的网站上全文搜索。搜索本身的工作已经但是我们现在要实现的修改。

We use Lucene.NET to implement a full text search on a clients website. The search itself works already but we now want to implement a modification.

目前所有条款得到追加一个 * 这线索Lucene来执行什么,我会为一个 StartsWith 搜索分类。

Currently all terms get appended a * which leads Lucene to perform what I would classify as a StartsWith search.

在未来,我们希望有一个搜索执行类似一个包含而非 StartsWith

In the future we would like to have a search that performs something like a Contains rather than a StartsWith.

我们用


  • Lucene.Net 2.9.2.2

  • StandardAnalyzer

  • 默认的QueryParser

样品:

(标题:管弦乐*)匹配:乐团

但是:

(标题:RCH *)不匹配:乐团

我们希望第一个和第二个既匹配乐团

We want the first and the second one to both match Orchestra.

基本上我想要的是什么在问这个问题正好相反,我不知道为什么这个人Lucene的执行包含键,而不是 StartsWith 默认:结果
这是为什么Lucene的查询"包含"而不是" startsWith"?

Basically I want the exact opposite of what was asked in this question, I'm not sure why for this person Lucene performed a Contains and rather than a StartsWith by default:
Why is this Lucene query a "contains" instead of a "startsWith"?

我们怎样才能做到这一点结果
我有一种感觉它有事可做用分析仪,但我不知道。

How can we make this happen?
I have the feeling it has something to do with the Analyzer but I'm not sure.

推荐答案

首先,我假设你正在使用StandardAnalyzer,或类似的东西。您链接的问题不明白,你搜索的条款,他的案件 A * ,因为它表征为舰队和非洲将匹配舰队非洲。

First off, I assume you're using StandardAnalyzer, or something similar. Your linked question fail to understand that you search for terms, and his case a* will match "Fleet Africa" because it's tokenized into "fleet" and "africa".

您需要调用 QueryParser.SetAllowLeadingWildcard(真)要能写这样字段*值* 。 ?你是真正改变的是传递给QueryParser的字符串

You need to call QueryParser.SetAllowLeadingWildcard(true) to be able to write queries like field:*value*. Are you actually changing the string that's passed to QueryParser?

您可以分析查询像往常一样,然后实施的 QueryVisitor 的重写所有 TermQuery WildcardQuery 。这样,你还支持词组搜索。

You could parse the query as usual, and then implement a QueryVisitor that rewrites all TermQuery into WildcardQuery. That way you still support phrase searches.

我看到重写查询转换为前缀或通配符查询没有好东西。有一个兽人,或胸部,和乐团之间很少共享,但是这两个词将匹配。相反,挂钩与支持词干,同义词的分析你的客户,并提供拼写校正功能来解决简单的搜索错误。

I see no good things in rewriting queries into prefix- or wildcard-queries. There is very little shared between an orc, or a chest, and an Orchestra, but both words will match. Instead, hook up your customer with an analyzer that supports stemming, synonyms, and provide a spell correction feature to fix simple searching mistakes.

这篇关于HOWTO执行“包含”搜索,而不是使用Lucene.Net“开头”的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆