我如何对solr/lucene分数进行归一化? [英] how do I normalise a solr/lucene score?

查看:121
本文介绍了我如何对solr/lucene分数进行归一化?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试找出如何提高solr搜索结果的得分.我的应用程序需要从solr结果中获取分数,并根据查询结果的好坏来显示一些星星". 5颗星=几乎/精确到0颗星,这意味着与搜索不完全匹配,例如只有一个元素命中.但是我得到的分数从1.4到0.8660254都返回了我将给5星的结果.我需要做的是以某种方式将这些结果转换成一定百分比,以便我可以用正确的星数标记这些结果.

I am trying to work out how to improve the scoring of solr search results. My application needs to take the score from the solr results and display a number of "stars" depending on how good the result(s) are to the query. 5 Stars = almost/exact down to 0 stars meaning not matching the search very well, e.g. only one element hits. However I am getting scores from 1.4 to 0.8660254 both are returning results that I would give 5 stars to. What I need to do is somehow turn these results in to a percentage so that I can mark these results, with the correct number of stars.

我运行的查询给出的1.4分是:

The query that I run that gives me the 1.4 score is:

euallowed:true AND(等级:"2:1")

给出0.8660254分数的查询是:

The query that gives me the 0.8660254 score is:

euallowed:true AND(等级:"2:1"或等级:"1st")

我已经更新了相似度,以便tf和idf返回1.0,因为我只在文档中有术语而不是文档中该术语的编号的情况下才感兴趣.这就是我的相似性代码:

I've already updated the Similarity so that the tf and idf return 1.0 as I am only interested if a document has a term, not the number of that term in the document. This is what my similarity code looks like:

import org.apache.lucene.search.Similarity;

public class StudentSearchSimilarity extends Similarity {

    @Override
    public float lengthNorm(String fieldName, int numTerms) {
        return (float) (1.0 / Math.sqrt(numTerms));
    }

    @Override
    public float queryNorm(float sumOfSquaredWeights) {

        return (float) (1.0 / Math.sqrt(sumOfSquaredWeights));

    }

    @Override
    public float sloppyFreq(int distance) {
        return 1.0f / (distance + 1);
    }

    @Override
    public float tf(float freq) {
        return (float) 1.0;
    }

    @Override
    public float idf(int docFreq, int numDocs) {

        //return (float) (Math.log(numDocs / (double) (docFreq + 1)) + 1.0);
        return (float)1.0;

    }

    @Override
    public float coord(int overlap, int maxOverlap) {
        return overlap / (float) maxOverlap;
    }
}

所以我想我的问题是:

  1. 标准化的最佳方法是什么 分数,以便我可以算出 要给许多明星"?

  1. How is the best way of normalising the score so that I can work out how many "stars" to give?

还有另一种评分方式 结果?

Is there another way of scoring the results?

谢谢

赠予

推荐答案

引用 http://wiki.apache.org/lucene-java/ScoresAsPercentages :

人们经常想从Lucene分数中计算出百分比",以确定什么是"100%完美"匹配与"50%"匹配.这也叫做归一化分数"

People frequently want to compute a "Percentage" from Lucene scores to determine what is a "100% perfect" match vs a "50%" match. This is also somethings called a "normalized score"

不要这样做.

严重.停止尝试以这种方式思考您的问题,这不会很好地解决.

Seriously. Stop trying to think about your problem this way, it's not going to end well.

该页面确实提供了一个示例,说明您在理论上如何做到这一点,但这很难.

That page does give an example of how you could in theory do this, but it's very hard.

这篇关于我如何对solr/lucene分数进行归一化?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆