了解索引时在Lucene中增强文档与搜索时其相应分数之间的关系 [英] understanding the relationship between boosting a document in lucene at index time and its corresponding score at search time

查看:90
本文介绍了了解索引时在Lucene中增强文档与搜索时其相应分数之间的关系的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

建立索引时,我会增强某些文档,但它们不会出现在检索到的文档列表的顶部.我查看了这些文档的分数,以某种方式,检索到的文档的分数始终为NaN.

When indexing, I boost certain documents, but they do not appear on the top of the list of retrieved documents. I looked at the score of those documents, and somehow, the score of the documents retrieved is always NaN.

在索引时提升文档数量与在检索时提升文档得分之间是什么关系?我认为这些会相互关联,而且,我认为我会在自己的scoredocs中获得广泛的分数,而不仅仅是NaN.如果您能对此有所了解,我将不胜感激.

What is the relationship between a boost of a document at index time and its score at retrieve time? I thought these would be correlated, and further, I thought I would get a wide range of scores in my scoredocs, not just NaN. If you can shed some light on this I would be grateful.

我已阅读 http://lucene .apache.org/java/2_3_2/api/org/apache/lucene/search/Similarity.html

无法找出丢失的东西.

这是简单的提升代码:

if (myCondition)  
{
   myDocument.SetBoost(1.1f);
}
myIndexWriter.AddDocument(document);

推荐答案

我将在这里进行一个疯狂的猜测,因为您还没有提供搜索代码示例,但常见的原因是重新获得文档的得分为NaN是因为您使用了排序.排序时,大多数时候不使用文档的分数,因此默认情况下为禁用.

I'm gonna go on a wild guess here since you havent provide a sample of you search code, but a common reason why the score of retreived docs is NaN is because you use a Sort. When sorting, most of the time the score of the documents is not used, and therefore disabled by default.

如果将排序"用于搜索,并且想要得到分数,请检查IndexSearcher类的方法setDefaultFieldSortScoring.通过此方法,您可以在使用排序"的搜索中对文档进行评分.

If you use a Sort for your search, and want the score, check the method setDefaultFieldSortScoring of the IndexSearcher class. This method allows you to enable scoring the documents in a search that uses a Sort.

查看全文

登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆