当重写多重查询时,将constant_score添加到每个术语,而不是整个查询 [英] When rewriting multiterm query, add constant_score to every term, not to the whole query
问题描述
匹配
查询,匹配搜索字符串到城市和州,然后使用 bool
组合这些匹配: 查询:{
bool:{
必须:{
:{
country:{
query:旧金山CA
}
}
},
should:{
match:{
city:{
query:旧金山CA
}
}
}
}
我的db中有这两个文件:
{city:旧金山,州:CA}
{city:圣马力诺 状态:圣马力诺}
问题是,匹配圣到圣马力诺的状态得分远远高于匹配CA到旧金山的州,因为有许多城市有州CA和v
我尝试使用 constant_score
禁用IDF,但这导致另一个问题:匹配旧金山CA到旧金山,其中两个术语匹配获得与匹配旧金山CA到圣马力诺相同的得分,只有一个术语匹配。当一个多重匹配查询被重写成单独的术语时,是否可以 constant_score
每个重写的查询,以便得到2分,以匹配旧金山并且只有San才能得到1分?
从 ElasticSearch讨论论坛我有一个解决方案。
使IDF常量最简单的方法是创建自定义类进行相似度计算。以下是我的 ElasticSearch 1.7.0的更新示例。
该类强制IDF始终等于1,这解决了我的问题。
I am looking for cities from geonames db. A typical search string would be "San Francisco CA". I have documents that have a city and a state field. I do a match
query, matching search string to city and state, then combine these matches using bool
:
"query" : {
"bool" : {
"must" : {
"match" : {
"country" : {
"query" : "San Francisco CA"
}
}
},
"should" : {
"match" : {
"city" : {
"query" : "San Francisco CA"
}
}
}
}
}
I have these two documents in my db:
{"city" : "San Francisco", "state" : "CA"}
{"city" : "San Marino", "state" : "San Marino"}
Problem is that matching "san" to San Marino's state scores much higher than matching CA to San Francisco's state, because there are many cities with state "CA" and very little cities with state "San Marino".
I try to disable IDF using constant_score
, but that leads to another problem: matching "San Francisco CA" to "San Francisco" where two terms match gets the same score as matching "San Francisco CA" to "San Marino" where only one term matches. When a multiterm match query is being rewritten into separate terms, is it possible to constant_score
each one of the rewritten queries, so that I get score of 2 for matching "San Francisco" and a score of 1 for matching just "San"?
With kind help from ElasticSearch discussion forum I have a solution.
The easiest way to make IDF constant is to create a custom class for similarity calculation. Here is my updated example for ElasticSearch 1.7.0.
The class forces IDF to always equal 1, which solves my problem.
这篇关于当重写多重查询时,将constant_score添加到每个术语,而不是整个查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!