Solr/Lucene记分器 [英] Solr/Lucene Scorer
问题描述
我们目前正在为使用Solr的客户端进行概念验证,并且能够配置除计分之外他们想要的所有功能.
We are currently working on a proof-of-concept for a client using Solr and have been able to configure all the features they want except the scoring.
问题是他们希望分数能够使结果落入桶中
Problem is that they want scores that make results fall in buckets:
- 时段1:类别完全匹配(得分= 4)
- 时段2:姓名完全匹配(得分= 3)
- 时段3:部分匹配(得分= 2)
- 时段4:姓名部分匹配(得分= 1)
我们要做的第一件事是开发一个自定义相似性类,该类将根据字段和完全或部分匹配项返回正确的分数.
First thing we did was develop a custom similarity class that would return the correct score depending on the field and an exact or partial match.
现在唯一的问题是,当文档在类别和名称上都匹配时,分数就会加在一起.
The only problem now is that when a document matches on both the category and name the scores are added together.
示例:搜索餐厅"会返回类别餐厅中的文件,其名称中也带有餐厅字样,因此得到5分(4 + 1分),但只能得到4分.
Example: searching for "restaurant" returns documents in the category restaurant that also have the word restaurant in their name and thus get a score of 5 (4+1) but they should only get 4.
我想为此工作,我们需要开发一个自定义的Scorer类,但是我们不知道如何将其整合到Solr中. 另一种选择是创建类似于Solr中已经存在的RandomSortField的自定义SortField实现.
I assume for this to work we would need to develop a custom Scorer class but we have no clue on how to incorporate this in Solr. Another option is to create a custom SortField implementation similar to the RandomSortField already present in Solr.
也许还有一个我们不知道的更简单的解决方案.
Maybe there is even a simpler solution that we don't know about.
欢迎所有建议!
推荐答案
您可以覆盖逻辑Solr计分器使用的逻辑. Solr使用DefaultSimilarity类进行评分.
You can override the logic solr scorer uses. Solr uses DefaultSimilarity class for scoring.
public class CustomSimilarity extends DefaultSimilarity {
public CustomSimilarity() {
super();
}
public float tf(int freq) {
//your code
return (float) 1.0;
}
public float idf(int docFreq, int numDocs) {
//your code
return (float) 1.0;
}
}
<similarity class="<your package name>.CustomSimilarity"/>
<similarity class="<your package name>.CustomSimilarity"/>
您可以查看影响得分的各种因素此处
You can check out various factors affecting score here
根据您的要求,如果您的分数在特定范围内,则可以创建存储桶.另请参阅有关字段增强,文档增强等的信息.这可能对您有帮助.
For your requirement you can create buckets if your score is in specific range. Also read about field boosting, document boosting etc. That might be helpful in your case.
这篇关于Solr/Lucene记分器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!