Solr/Lucene 记分器 [英] Solr/Lucene Scorer

查看：20 发布时间：2022/1/15 12:22:29 lucene solr

本文介绍了Solr/Lucene 记分器的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我们目前正在为使用 Solr 的客户进行概念验证，并且已经能够配置他们想要的所有功能，除了评分.

We are currently working on a proof-of-concept for a client using Solr and have been able to configure all the features they want except the scoring.

问题是他们想要使结果落入桶中的分数:

Problem is that they want scores that make results fall in buckets:

第 1 组:完全匹配类别(得分 = 4)
存储桶 2:名称完全匹配(分数 = 3)
桶 3:部分匹配类别(分数 = 2)
存储桶 4:名称部分匹配(分数 = 1)

我们做的第一件事是开发一个自定义相似性类，它会根据字段和完全匹配或部分匹配返回正确的分数.

First thing we did was develop a custom similarity class that would return the correct score depending on the field and an exact or partial match.

现在唯一的问题是，当文档在类别和名称上都匹配时，分数会加在一起.

The only problem now is that when a document matches on both the category and name the scores are added together.

示例:搜索restaurant"会返回名称中也包含餐厅"一词的餐厅类别中的文档，因此得分为 5 (4+1)，但他们应该只获得 4.

Example: searching for "restaurant" returns documents in the category restaurant that also have the word restaurant in their name and thus get a score of 5 (4+1) but they should only get 4.

我假设要使其正常工作，我们需要开发一个自定义的 Scorer 类，但我们不知道如何将其合并到 Solr 中.另一种选择是创建一个自定义的 SortField 实现，类似于 Solr 中已经存在的 RandomSortField.

I assume for this to work we would need to develop a custom Scorer class but we have no clue on how to incorporate this in Solr. Another option is to create a custom SortField implementation similar to the RandomSortField already present in Solr.

也许还有一个我们不知道的更简单的解决方案.

Maybe there is even a simpler solution that we don't know about.

欢迎所有建议！

推荐答案

您可以覆盖 solr scorer 使用的逻辑.Solr 使用 DefaultSimilarity 类进行评分.

创建一个扩展 DefaultSimilarity 的类，并根据需要覆盖函数 tf()、idf() 等:

You can override the logic solr scorer uses. Solr uses DefaultSimilarity class for scoring.

Make a class extending DefaultSimilarity and override the functions tf(), idf() etc according to your need:

public class CustomSimilarity extends DefaultSimilarity {

  public CustomSimilarity() {
    super();
  }

  public float tf(int freq) {
    //your code  
    return (float) 1.0;
  }

  public float idf(int docFreq, int numDocs) {
    //your code
    return (float) 1.0;
  }

}

创建类后编译并制作一个jar.

将jar 放到对应index 或core 的lib 文件夹中.

更改对应索引的schema.xml:<similarity class="<你的包名>.CustomSimilarity"/>

您可以查看影响分数的各种因素这里

You can check out various factors affecting score here

如果您的分数在特定范围内，您可以根据您的要求创建存储桶.另请阅读有关字段提升、文档提升等的信息.这可能对您的情况有所帮助.

For your requirement you can create buckets if your score is in specific range. Also read about field boosting, document boosting etc. That might be helpful in your case.

这篇关于Solr/Lucene 记分器的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Solr/Lucene 记分器 [英] Solr/Lucene Scorer

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Solr/Lucene 记分器 [英] Solr/Lucene Scorer

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭