当重写多重查询时,将constant_score添加到每个术语,而不是整个查询 [英] When rewriting multiterm query, add constant_score to every term, not to the whole query

查看:138
本文介绍了当重写多重查询时,将constant_score添加到每个术语,而不是整个查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从geonames db寻找城市。典型的搜索字符串将是旧金山CA。我有一个城市和州立的文件。我做一个匹配查询,匹配搜索字符串到城市和州,然后使用 bool 组合这些匹配:

 查询:{
bool:{
必须:{
:{
country:{
query:旧金山CA
}
}
},
should:{
match:{
city:{
query:旧金山CA
}
}
}

}

我的db中有这两个文件:

  {city:旧金山,州:CA} 
{city:圣马力诺 状态:圣马力诺}

问题是,匹配圣到圣马力诺的状态得分远远高于匹配CA到旧金山的州,因为有许多城市有州CA和v



我尝试使用 constant_score 禁用IDF,但这导致另一个问题:匹配旧金山CA到旧金山,其中两个术语匹配获得与匹配旧金山CA到圣马力诺相同的得分,只有一个术语匹配。当一个多重匹配查询被重写成单独的术语时,是否可以 constant_score 每个重写的查询,以便得到2分,以匹配旧金山并且只有San才能得到1分?

解决方案

ElasticSearch讨论论坛我有一个解决方案。



使IDF常量最简单的方法是创建自定义类进行相似度计算。以下是我的 ElasticSearch 1.7.0的更新示例



该类强制IDF始终等于1,这解决了我的问题。


I am looking for cities from geonames db. A typical search string would be "San Francisco CA". I have documents that have a city and a state field. I do a match query, matching search string to city and state, then combine these matches using bool:

"query" : {
    "bool" : {
        "must" : {
            "match" : {
                "country" : {
                    "query" : "San Francisco CA"
                }
            }
        },
        "should" : {
            "match" : {
                "city" : {
                    "query" : "San Francisco CA"
                }
            }
        }
    }
}

I have these two documents in my db:

{"city" : "San Francisco", "state" : "CA"}
{"city" : "San Marino", "state" : "San Marino"}

Problem is that matching "san" to San Marino's state scores much higher than matching CA to San Francisco's state, because there are many cities with state "CA" and very little cities with state "San Marino".

I try to disable IDF using constant_score, but that leads to another problem: matching "San Francisco CA" to "San Francisco" where two terms match gets the same score as matching "San Francisco CA" to "San Marino" where only one term matches. When a multiterm match query is being rewritten into separate terms, is it possible to constant_score each one of the rewritten queries, so that I get score of 2 for matching "San Francisco" and a score of 1 for matching just "San"?

解决方案

With kind help from ElasticSearch discussion forum I have a solution.

The easiest way to make IDF constant is to create a custom class for similarity calculation. Here is my updated example for ElasticSearch 1.7.0.

The class forces IDF to always equal 1, which solves my problem.

这篇关于当重写多重查询时,将constant_score添加到每个术语,而不是整个查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆