禁用IDF计算 [英] Disable IDF calculation

查看:161
本文介绍了禁用IDF计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的具体用例中,作为TF-IDF算法计算的IDF因子会影响我的查询得分。基本上,我希望查询只考虑术语频率。是否可以禁用特定索引的IDF因子,即将其设置为1?我已经研究了相似性模块(在0.90.X版本),但没有真正发现任何可以帮助的东西;对于function_score查询也是一样。我需要在java中编写一个自定义的相似性类吗?还是有一个我想要实现的插件?

解决方案

如何使用constant_score查询?



请参阅进行的截图

In my particular use case, the IDF-factor that gets calculated as part of the TF-IDF algorithm messes up the scoring for my queries. Basically, I want the queries to only take the term frequency into account. Is it possible to disable the IDF factor, i.e set it to 1, for a particular index? I have looked into the similarity module (in version 0.90.X), but haven't really found anything that could help; same goes for the function_score query. Do I need to write a custom Similarity class in java? Or is there a plugin for what I'm trying to achieve?

解决方案

What about constant_score query?

See http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/ignoring-tfidf.html

Don't hesitate to use ?explain=true to see how scoring is working.

As you can here without constant_filter:

And with constant_filter query (that wraps your real query):

这篇关于禁用IDF计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆