Lucene词组查询中的字词或 [英] Lucene phrase query with terms in OR
问题描述
假设我有5个文档,其字段文本如下:
Suppose that i have 5 documents having the field text as follow:
- 红房子很漂亮
- 房子很小
- 红色的鱼
- 红色和黄色的房子很大
如果我搜索红房子",我应该使用哪种查询来检索文档,以使排名如下:
What kind of query should i use to retrieve the documents such that the rank is the following if i search for "red house":
- 红房子漂亮又大[匹配项:红房子]
- 红色和黄色的房子很大[匹配:红色x x的房子]
- 房子很少[匹配项:房子]
- 红色鱼[匹配:红色]
- the red house is beautiful and big [matching: red house]
- the red and yellow house is big [matching: red x x house]
- the house is little [matching: house]
- the red fish [matching: red]
我需要给与我搜索过的短语匹配的文档较高的等级,而给只包含搜索到的短语一部分的文档较低的分数. 请注意,字符串查询可能还包含两个以上的词.
What i need is to give an high rank to the documents that match the phrase i've searched, and a lower score to the documents that have just a part of the phrase searched. Notice that the string query could contains also more than 2 terms.
就像PhraseQuery一样,其中每个字词都可以出现或不出现,并且字词越近,得分越高.
It is like a PhraseQuery in which each term can appear or not, and in which the closer are the terms the higher is the score.
我试图将TermQuery和TermQuery组合使用,但结果不是我所需要的.
I've tried to use compose a PhraseQuery with a TermQuery but the result is not what i need.
我该怎么办?
谢谢
推荐答案
尝试创建由TermQuery对象和OR组合而成的BooleanQuery.这样可以匹配仅出现一个术语的文档,但是应该对出现两个术语的文档给予更高的分数.
Try creating a BooleanQuery composed of TermQuery objects, combined with OR (BooleanClause.Occur.SHOULD). This will match documents where only one term appears, but should give a higher score to those where both appear.
Query term1 = new TermQuery(new Term("text", "red"));
Query term2 = new TermQuery(new Term("text", "house"));
BooleanQuery booleanQuery = new BooleanQuery();
booleanQuery.add(term1, BooleanClause.Occur.SHOULD);
booleanQuery.add(term2, BooleanClause.Occur.SHOULD);
这篇关于Lucene词组查询中的字词或的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!