Lucene词组查询中的字词或 [英] Lucene phrase query with terms in OR

查看:189
本文介绍了Lucene词组查询中的字词或的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有5个文档,其字段文本如下:

Suppose that i have 5 documents having the field text as follow:

  1. 红房子很漂亮
  2. 房子很小
  3. 红色的鱼
  4. 红色和黄色的房子很大

如果我搜索红房子",我应该使用哪种查询来检索文档,以使排名如下:

What kind of query should i use to retrieve the documents such that the rank is the following if i search for "red house":

  1. 红房子漂亮又大[匹配项:红房子]
  2. 红色和黄色的房子很大[匹配:红色x x的房子]
  3. 房子很少[匹配项:房子]
  4. 红色鱼[匹配:红色]
  1. the red house is beautiful and big [matching: red house]
  2. the red and yellow house is big [matching: red x x house]
  3. the house is little [matching: house]
  4. the red fish [matching: red]

我需要给与我搜索过的短语匹配的文档较高的等级,而给只包含搜索到的短语一部分的文档较低的分数. 请注意,字符串查询可能还包含两个以上的词.

What i need is to give an high rank to the documents that match the phrase i've searched, and a lower score to the documents that have just a part of the phrase searched. Notice that the string query could contains also more than 2 terms.

就像PhraseQuery一样,其中每个字词都可以出现或不出现,并且字词越近,得分越高.

It is like a PhraseQuery in which each term can appear or not, and in which the closer are the terms the higher is the score.

我试图将TermQuery和TermQuery组合使用,但结果不是我所需要的.

I've tried to use compose a PhraseQuery with a TermQuery but the result is not what i need.

我该怎么办?

谢谢

推荐答案

尝试创建由TermQuery对象和OR组合而成的BooleanQuery.这样可以匹配仅出现一个术语的文档,但是应该对出现两个术语的文档给予更高的分数.

Try creating a BooleanQuery composed of TermQuery objects, combined with OR (BooleanClause.Occur.SHOULD). This will match documents where only one term appears, but should give a higher score to those where both appear.

Query term1 = new TermQuery(new Term("text", "red"));
Query term2 = new TermQuery(new Term("text", "house"));
BooleanQuery booleanQuery = new BooleanQuery();
booleanQuery.add(term1, BooleanClause.Occur.SHOULD);
booleanQuery.add(term2, BooleanClause.Occur.SHOULD);

这篇关于Lucene词组查询中的字词或的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆