如何标准化 solr/lucene 分数? [英] how do I normalise a solr/lucene score?

查看:30
本文介绍了如何标准化 solr/lucene 分数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在努力研究如何提高 solr 搜索结果的评分.我的应用程序需要从 solr 结果中获取分数,并根据查询结果的好坏程度显示一些星星".5 颗星 = 几乎/精确到 0 颗星意味着与搜索不匹配,例如只有一个元素命中.然而,我得到的分数从 1.4 到 0.8660254 都返回了我会给 5 星的结果.我需要做的是以某种方式将这些结果转换为百分比,以便我可以用正确的星数标记这些结果.

I am trying to work out how to improve the scoring of solr search results. My application needs to take the score from the solr results and display a number of "stars" depending on how good the result(s) are to the query. 5 Stars = almost/exact down to 0 stars meaning not matching the search very well, e.g. only one element hits. However I am getting scores from 1.4 to 0.8660254 both are returning results that I would give 5 stars to. What I need to do is somehow turn these results in to a percentage so that I can mark these results, with the correct number of stars.

我运行的给我 1.4 分的查询是:

The query that I run that gives me the 1.4 score is:

euallowed:true AND(grade:"2:1")

给我 0.8660254 分数的查询是:

The query that gives me the 0.8660254 score is:

euallowed:true AND(grade:"2:1" OR Grade:"1st")

我已经更新了 Similarity,以便 tf 和 idf 返回 1.0,因为我只对文档是否有术语感兴趣,而不是文档中该术语的数量.这是我的相似度代码的样子:

I've already updated the Similarity so that the tf and idf return 1.0 as I am only interested if a document has a term, not the number of that term in the document. This is what my similarity code looks like:

import org.apache.lucene.search.Similarity;

public class StudentSearchSimilarity extends Similarity {

    @Override
    public float lengthNorm(String fieldName, int numTerms) {
        return (float) (1.0 / Math.sqrt(numTerms));
    }

    @Override
    public float queryNorm(float sumOfSquaredWeights) {

        return (float) (1.0 / Math.sqrt(sumOfSquaredWeights));

    }

    @Override
    public float sloppyFreq(int distance) {
        return 1.0f / (distance + 1);
    }

    @Override
    public float tf(float freq) {
        return (float) 1.0;
    }

    @Override
    public float idf(int docFreq, int numDocs) {

        //return (float) (Math.log(numDocs / (double) (docFreq + 1)) + 1.0);
        return (float)1.0;

    }

    @Override
    public float coord(int overlap, int maxOverlap) {
        return overlap / (float) maxOverlap;
    }
}

所以我想我的问题是:

  1. 规范化的最佳方式是什么分数,以便我可以计算出如何要给多少星星"?

  1. How is the best way of normalising the score so that I can work out how many "stars" to give?

有没有其他的评分方式结果?

Is there another way of scoring the results?

谢谢

资助

推荐答案

引用 http://wiki.apache.org/lucene-java/ScoresAsPercentages:

人们经常想根据 Lucene 分数计算百分比",以确定什么是100% 完美"匹配与50%"匹配.这也就是所谓的归一化分数"

People frequently want to compute a "Percentage" from Lucene scores to determine what is a "100% perfect" match vs a "50%" match. This is also somethings called a "normalized score"

不要这样做.

说真的.不要试图以这种方式思考你的问题,它不会有好的结局.

Seriously. Stop trying to think about your problem this way, it's not going to end well.

该页面确实提供了一个示例,说明理论上您如何做到这一点,但这非常困难.

That page does give an example of how you could in theory do this, but it's very hard.

这篇关于如何标准化 solr/lucene 分数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆