在Lucene BooleanQuery中将整个句子与空格匹配 [英] Matching entire sentence with spaces in lucene BooleanQuery

查看:109
本文介绍了在Lucene BooleanQuery中将整个句子与空格匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个搜索字符串,

Tulip INN Riyadhh
 Tulip INN Riyadhh LUXURY
 Suites of Tulip INN RIYAHdhh

如果需要,我需要搜索词

I need search term , if i mention

 *Tulip INN Riyadhh*

它必须返回上述所有三个,我有一个限制,我必须在没有QueryParser或Analyser的情况下实现这一点,它只能是BooleanQuery/WildCardQuery/etc....

it has to return all the three above, i have restriction that i have to achieve this without QueryParser or Analyser, it has to be only BooleanQuery/WildCardQuery/etc....

关于, 拉加万

推荐答案

此处需要的是PhraseQuery.让我解释一下.

What you need here is a PhraseQuery. Let me explain.

我不知道您使用的是哪个分析器,但是为了简单起见,我想您有一个非常基本的分析器,它只是将文本转换为小写字母.不要告诉我您不使用anlayzer,因为Lucene至少在索引编制阶段必须执行任何工作,这是定义令牌生成器和令牌过滤器链的条件.

I don't know which analyzer you're using, but I'll suppose you have a very basic one for simplicity, that just converts text to lowercase. Don't tell me you're not using an anlayzer since it's mandatory for Lucene to do any work, at least at the indexing stage - this is what defines the tokenizer and the token filter chain.

在此示例中,这是对字符串进行标记的方式:

Here's how your strings would be tokenized in this example:

  • tulip inn ryiadhh
  • tulip inn ryiadhh luxury
  • suites of tulip inn ryiadhh
  • tulip inn ryiadhh
  • tulip inn ryiadhh luxury
  • suites of tulip inn ryiadhh

请注意它们都如何包含令牌序列tulip inn ryiadhh. PhraseQuery正在寻找令牌序列.

Notice how these all contain the token sequence tulip inn ryiadhh. A sequence of tokens is what a PhraseQuery is looking for.

在Lucene.Net中构建这样的查询看起来像这样(未经测试):

In Lucene.Net building such a query looks like this (untested):

var query = new PhraseQuery();
query.Add(new Term("propertyName", "tulip"));
query.Add(new Term("propertyName", "inn"));
query.Add(new Term("propertyName", "ryiadhh"));

请注意,这些术语需要与分析仪产生的术语匹配(在本示例中,它们全部为小写字母). QueryParser通过在分析器中运行查询的一部分来为您完​​成这项工作,但是如果您不使用解析器,则必须自己完成.

Note that the terms need to match those produced by the analyzer (in this example, they're all lowercase). The QueryParser does this job for you by running parts of the query through the analyzer, but you'll have to do it yourself if you don't use the parser.

现在,为什么WildcardQueryRegexQuery在这种情况下不起作用?这些查询始终匹配单个术语,但是您需要匹配有序的术语序列.例如,带有术语Riyadhh*WildcardQuery会找到所有以Riyadhh 开头的单词.

Now, why wouldn't WildcardQuery or RegexQuery work in this situation? These queries always match a single term, yet you need to match an ordered sequence of terms. For instance a WildcardQuery with the term Riyadhh* would find all words starting with Riyadhh.

具有TermQuery MUST子句集合的BooleanQuery将匹配碰巧以任何顺序包含这三个术语的任何文本-也不完全是您想要的.

A BooleanQuery with a collection of TermQuery MUST clauses would match any text that happens to contain these 3 terms in any order - not exactly what you want either.

这篇关于在Lucene BooleanQuery中将整个句子与空格匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆