Lucene的ScoreDoc.score是什么意思? [英] What does Lucene's ScoreDoc.score mean?

查看:75
本文介绍了Lucene的ScoreDoc.score是什么意思?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在执行布尔查询包含多个字词.我只想使用得分高于特定阈值.我的问题是,我不知道该值是如何计算的.我知道高数字表示匹配良好,低数字表示匹配不良,但是似乎没有上限吗?

是否可以在[0,1]范围内对分数进行归一化?

解决方案

这是描述在Lucene中如何计算分数的页面:

http://lucene.apache.org/java/3_0_0/scoring.html

简短的答案是,每个文档分数的绝对值实际上并不表示给定搜索结果集范围之外的任何内容.换句话说,即使您对分数进行了标准化,也没有一种将分数转换为人类相关性定义的好方法.

话虽如此,您可以通过将每个匹配的得分除以最高得分来轻松地对得分进行归一化.因此,如果第一个匹配的得分是2.5,则将每个匹配的得分除以2.5,您将得到一个介于0和1之间的数字.

I am performing a boolean query with multiple terms. I only want to process results with a score above a particular threshold. My problem is, I don't understand how this value is calculated. I understand that high numbers mean its a good match, and low numbers mean its a bad match, but there doesn't seem to be any upper bounds?

Is it possible to normalize the scores over the range [0,1]?

解决方案

Here is a page describing how scores are calculated in Lucene:

http://lucene.apache.org/java/3_0_0/scoring.html

The short answer is that the absolute values of each document's score doesn't really mean anything outside the context of a given search result set. In other words, there isn't really a good way of translating the scores to a human definition of relevance, even if you do normalize the scores.

That being said you can easily normalize the scores by dividing each hit's score by the maximum score. So if the first hit's score is 2.5, then divide every hit's score by 2.5, and you'll get a number in between 0 and 1.

这篇关于Lucene的ScoreDoc.score是什么意思?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆