如何修改SOLR的tfidf相似性? [英] How to make modifications to SOLR's tfidf similarity?

查看:93
本文介绍了如何修改SOLR的tfidf相似性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试搜索标题,因此仅出现该词就足够了,并且其频率至少与我的用例无关.

I am trying to search for titles hence just the presence of the word is sufficient and its frequency is not relevant at least to my use-case.

例如:搜索查询是:早起带我的宠物"

For e.g: the search query is: "board early with my pets"

我得到的结果是:结果1:宠物用品2.3924026

The results I got are: Result 1: Pets 2.3924026

结果2:宠物计入机舱宠物限制2.0538325

Result 2: Pets Counts against in cabin pet limit 2.0538325

结果3:允许宠物登机1.6092906

Result 3: Pets Preboarding allowed 1.6092906

理想情况下,我希望结果3出现在顶部,这需要进行一些外部工作.但是结果1很明显并且可以接受,但是结果2的得分为2.05,因为它两次提到了"pet",这意味着tf值较高[2/4(在删除停用词之后)].我的要求是仅检测单词的存在,而不要考虑单词的出现频率.

Ideally I want the result 3 to come at the top which needs some external work. However the result 1 is obvious and acceptable but the result 2 has the score of 2.05 as it has 'pet' is mentioned twice, implies the tf value is higher [2/4(after removing stop words)]. My requirement is just detect the presence of the word and not to go for word's frequency.

如何实现这一目标?

推荐答案

如果不需要短语搜索或其他依赖于要索引的位置数据的功能,则可以使用 omitTermFreqAndPositions ="true" 讨论中的字段.在这种情况下,将不会为这些条款存储任何位置或频率.

If you don't need phrase search or other functionality that depend on position data being indexed, you can use omitTermFreqAndPositions="true" for the field in question. In that case no position or frequency will be stored for the terms.

如果这不是一个选项,则可以创建一个伪相似性类,该类扩展DefaultSimilarity并为tf返回1.0f.这样的示例可以在 Solr自定义相似度中找到.

If that's not an option, you can create a dummy similarity class that extends DefaultSimilarity and returns 1.0f for tf. Such an example can be found in Solr Custom Similarity.

您还可以为每个字段配置不同的相似性类别,从而允许您为单个字段删除 tf 得分.

You can also configure different similarity classes for each field, allowing you to drop tf scoring for a single field.

第三个选择是使用

A third option is to use the constant scoring operator for the part of your query that you want to have constant score. Not sure if the edismax parser supports this.

这篇关于如何修改SOLR的tfidf相似性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆