是否可以“合理地”设置Solr分数阈值,与返回的结果无关? (即Solr Scoring是否以任何方式标准化) [英] Is it possible to set a Solr Score threshold 'reasonably', independent of results returned? (i.e. Is Solr Scoring standardized in any way)

查看:113
本文介绍了是否可以“合理地”设置Solr分数阈值,与返回的结果无关? (即Solr Scoring是否以任何方式标准化)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含许多条目的Solr索引,并且在查询时会返回一些子集 - 每个条目都有一些分数,(明显)。一旦结果与分数一起返回,我希望能够保持高于某个分数的结果(即仅具有特定质量的结果)。当返回的子集可以是任何东西时,是否可以这样做?

I have a Solr index with many entries, and upon query some subset is returned - each entry having some score, (Obvious). Once the results are returned with scores, I want to be able to only "keep" results that are above some score (i.e. results of a certain quality only). Is it possible to do this when the returned subset could be anything?

我问,因为在某些查询中似乎有一个得分为0.008的结果导致了一个不错的匹配,而其他查询得分较高会导致匹配不佳。

I ask because it seems like on some queries a score of say 0.008 is resulting in a decent match, whereas other queries a higher score results in a poor match.

理想情况下,我只是想找一个方法来获取顶部的 x 条目,只要它们在至少有一定的品质。

Ideally I'm just looking for a method to take the top x entries as long as they are of at least a certain quality.

提前致谢!

推荐答案

我认为你不应该这样做。使用TF-IDF评分模型,无法计算所有结果相关的分数,反之亦然。如果您设法这样做,很可能在您的索引更新后,此阈值将不再有效(因为文档频率会发生变化)。

I think you should not do this. With the TF-IDF scoring model, there is no way to compute a score above which all results are relevant and vice-versa. And if you manage to do this, it is very likely that this threshold will not be valid anymore after a few updates to your index (because document frequencies will change).

如果你仍然想要这样做,我认为使用函数查询是可以实现的:如果()和查询 Solr中可用的函数。只需过滤结果,以便只保留分数高于给定阈值的条目。

If you still want to do this, I think it is achievable using function queries : there are a if (in trunk), and a query functions available in Solr. Just filter your results so that you only keep entries which have a higher score than a given threshold.

这篇关于是否可以“合理地”设置Solr分数阈值,与返回的结果无关? (即Solr Scoring是否以任何方式标准化)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆