Solr/Lucene 记分员 [英] Solr/Lucene Scorer
问题描述
我们目前正在为使用 Solr 的客户进行概念验证,并且已经能够配置他们想要的所有功能,但评分除外.
We are currently working on a proof-of-concept for a client using Solr and have been able to configure all the features they want except the scoring.
问题是他们想要让结果落到实处的分数:
Problem is that they want scores that make results fall in buckets:
- 存储分区 1:完全匹配类别(分数 = 4)
- 存储分区 2:名称完全匹配(分数 = 3)
- 桶 3:类别部分匹配(分数 = 2)
- 桶 4:名称部分匹配(分数 = 1)
我们做的第一件事是开发一个自定义相似度类,它会根据字段和完全匹配或部分匹配返回正确的分数.
First thing we did was develop a custom similarity class that would return the correct score depending on the field and an exact or partial match.
现在唯一的问题是,当文档在类别和名称上都匹配时,分数会加在一起.
The only problem now is that when a document matches on both the category and name the scores are added together.
示例:搜索restaurant"会返回名称中也包含餐厅"一词的餐厅类别中的文档,因此获得 5 (4+1) 的分数,但他们应该只获得 4.
Example: searching for "restaurant" returns documents in the category restaurant that also have the word restaurant in their name and thus get a score of 5 (4+1) but they should only get 4.
我假设要使其工作,我们需要开发一个自定义的 Scorer 类,但我们不知道如何将其合并到 Solr 中.另一种选择是创建一个自定义的 SortField 实现,类似于 Solr 中已经存在的 RandomSortField.
I assume for this to work we would need to develop a custom Scorer class but we have no clue on how to incorporate this in Solr. Another option is to create a custom SortField implementation similar to the RandomSortField already present in Solr.
也许还有一个我们不知道的更简单的解决方案.
Maybe there is even a simpler solution that we don't know about.
欢迎所有建议!
推荐答案
您可以覆盖 solr scorer 使用的逻辑.Solr 使用 DefaultSimilarity 类进行评分.
You can override the logic solr scorer uses. Solr uses DefaultSimilarity class for scoring.
public class CustomSimilarity extends DefaultSimilarity {
public CustomSimilarity() {
super();
}
public float tf(int freq) {
//your code
return (float) 1.0;
}
public float idf(int docFreq, int numDocs) {
//your code
return (float) 1.0;
}
}
您可以查看影响分数的各种因素这里
You can check out various factors affecting score here
如果您的分数在特定范围内,您可以根据您的要求创建存储桶.另请阅读有关字段增强、文档增强等的信息.这可能对您的情况有所帮助.
For your requirement you can create buckets if your score is in specific range. Also read about field boosting, document boosting etc. That might be helpful in your case.
这篇关于Solr/Lucene 记分员的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!