将Solr的文档分数与静态索引分数相结合 [英] Combine solr's document score with a static, indexed score

查看:120
本文介绍了将Solr的文档分数与静态索引分数相结合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我让人们根据他们编写的文档编入solr.为了简单起见,假设它们具有三个字段-整数ID,文本字段和浮点"SpecialRank"(介于0到1之间的值,表示该人的身高). solr中的相关性匹配全部通过文本"字段完成.但是,我希望最终结果列表是由solr和我自己的SpecialRank提供的与查询相关的组合.即,我需要根据以下公式对结果重新排序:

I have people indexed into solr based on documents that they have authored. For simplicity's sake, let's say they have three fields - an integer ID, a Text field and a floating point 'SpecialRank' (a value between 0 and 1 to indicate how great the person is). Relevance matching in solr is all done through the Text field. However, I want my final result list to be a combination of relevance to the query as provided by solr and my own SpecialRank. Namely, I need to re-rank the results based on the following formula:

finalScore = (0.8 * solrScore) + (0.2 * SpecialScore)

据我所知,这是信息检索中的常见任务,因为我们只是以加权方式组合两个不同的分数.麻烦的是,我需要对solrScore进行规范化才能正常工作.我一直在做的是基于maxScore针对特定查询对solrScore进行标准化,并在客户端对结果进行重新排名.这样做一直很好,但是这意味着我必须重新排序才能从solr中检索所有匹配的文档.

As far as I'm aware, this is a common task in information retrieval, as we are just combining two different scores in a weighted manner. The trouble is, I need solrScore to be normalized for this to work. What I have been doing is normalizing the solrScore based on the maxScore for a particular query and re-ranking the results client-side. This has been working OK, but means I have to retrieve all the matching documents from solr before I do my re-ranking.

我正在寻找让solr处理此重新排名的最佳方法.增强功能可以在这里提供帮助吗?我已经读过它们可以对solr分数相乘或相加,但是由于solr分数未进行规范化,并且各地都取决于不同的查询,因此这似乎并不能真正解决我的问题.我尝试过的另一种方法是先查询单个文档的solr以获得maxScore,然后使用以下公式进行排序:

I am looking for the best way to have solr take care of this re-ranking. Are boost functions able to help here? I have read that they can be multiplicative or additive to the solr score, but since the solr score is not normalized and all over the place depending on different queries, this doesn't really seem to solve my problem. Another approach I have tried is to first query solr for a single document just to get the maxScore, and then use the following formula for the sort:

sum(product(0.8,div(score,maxScore)),product(0.2,SpecialRank))+desc

这当然不起作用,因为您无法将分数用作排序函数中的变量.

This, of course, doesn't work as you're unable to use the score as a variable in a sort function.

我在这里疯了吗?当然,这是IR中足够普遍的任务.我已经把头撞在墙上好一阵子了,任何想法都将不胜感激.

Am I crazy here? Surely this is a common enough task in IR. I've been banging my head against the wall for a while now, any ideas would be much appreciated.

推荐答案

您可以尝试实现自定义SearchComponent,该自定义SearchComponent将在Solr上走低谷,并在那里计算自定义分数.从ResponseBuilder(rb.getResults().docSet)中找到结果,然后遍历它们,将计算出的值添加到结果中,然后对其进行重新排序.

You could try to implement custom SearchComponent that will go trough results on Solr and calculate your custom score there. Get results found from ResponseBuilder (rb.getResults().docSet), iterate trough them, add calculated value to your results and re-sort them.

然后您可以将SearchComponent注册为RequestHandler链中的最后一个:

You can then register your SearchComponent as last in RequestHandler chain:

<arr name="last-components">
  <str>elevator</str>
</arr>

SolR手册中的更多信息: http://wiki.apache.org/solr/SearchComponent

More info in SolR manual: http://wiki.apache.org/solr/SearchComponent

对不起,但暂时没有更好的主意.

Sorry, but no better idea for now.

这篇关于将Solr的文档分数与静态索引分数相结合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆