弹性搜索群集中相同查询的不同结果 [英] Different results for same query in Elasticsearch Cluster

查看:91
本文介绍了弹性搜索群集中相同查询的不同结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经创建了一个具有3个节点的弹性搜索集群,具有3个分片和2个副本。
当同一个索引使用相同的数据时,相同的查询获取不同的结果。
现在结果基本上是由_score字段desc排序(我认为它的默认排序方式),而且要求也希望结果按照desc的顺序进行排序。
所以这里我的问题是为什么同样的查询产生不同的结果,然后如何纠正这样的结果,每次都有相同的查询。



查询附加

  {
from:0,
size:10,
query:{
bool:{
must:{
bool:{
must:{
terms
context:[
我的名字
]
}
},
应该:{
multi_match {
query:test,
fields:[
field1 ^ 2,
field2 ^ 2,
field3 ^ 3
]
}
},
minimum_should_match:1
}
},
过滤器:{
bool:{
必须:[
{
:{
观众群:[
1235
]
}
},
{
条款:{
consumablestatus:[
1
]
}
}
],
minimum_should_match:1
}
}
}
}

}



感谢
Ashit

解决方案

可能的原因之一可能是分布式IDF,默认情况下Elastic在每个分片上使用本地IDF,以节省一些将导致集群中不同idfs的形式。所以,你应该尝试?search_type = dfs_query_then_fetch ,这将明确要求Elastic计算全局IDF。


但是,出于性能原因,Elasticsearch不会在索引中的所有文档中计算
IDF。相反,每个分片为这个分片中包含的文档计算
a本地IDF。



由于我们的文档分布良好,因此两个分片的IDF都为
将是一样的现在想象一下,这五个foo文件
在碎片1上,第六个文档是碎片2.在这个
的情况下,foo这个术语在一个分片上是很常见的b $ b的重要性),但在其他碎片上是罕见的(而且更重要)。
IDF中的这些差异可能会产生不正确的结果。



在实践中,这不是问题。本地和
全局IDF之间的差异减少了您添加到索引的更多文档。使用
的实际数据量,本地IDF很快就会出来。问题
不是相关性破坏,但数据太少。



为了测试的目的,有两种方法可以解决这个
问题。第一个是使用一个主分片创建一个索引,就像我们
在引入匹配查询的部分一样。如果你只有一个
的分片,那么本地的IDF就是全局IDF。



第二个解决方法是将?search_type = dfs_query_then_fetch添加到
您的搜索请求。 dfs表示分布式频率搜索
,它告诉Elasticsearch首先从每个
分片中检索本地IDF,以便计算整个索引的全局IDF。


有关更多信息,请查看 here


I have created a Elasticsearch cluster with 3 nodes , having 3 shards and 2 replicas. The same query fetch different results when hit to the same index with same data. Right now the results are basically sorted by the _score field desc (I think its the default way of sorting) and requirement also wants that the result be sorted in desc order of there score. So here my question is why does same query yield different result, and then how can this be corrected to have same result every time with same query.

query attached

    {
"from": 0,
"size": 10,
"query": {
    "bool": {
        "must": {
            "bool": {
                "must": {
                    "terms": {
                        "context": [
                            "my name"
                        ]
                    }
                },
                "should": {
                    "multi_match": {
                        "query": "test",
                        "fields": [
                            "field1^2",
                            "field2^2",
                            "field3^3"
                        ]
                    }
                },
                "minimum_should_match": "1"
            }
        },
        "filter": {
            "bool": {
                "must": [
                    {
                        "terms": {
                            "audiencecomb": [
                                "1235"
                            ]
                        }
                    },
                    {
                        "terms": {
                            "consumablestatus": [
                                "1"
                            ]
                        }
                    }
                ],
                "minimum_should_match": "1"
            }
        }
    }
}

}

Thanks Ashit

解决方案

One of the possible reasons could be distributed IDF, by default Elastic uses local IDF on each shard, to save some performance which will lead to different idfs across the cluster. So, you should try ?search_type=dfs_query_then_fetch, which will explicitly asks Elastic to compute global IDF.

However, for performance reasons, Elasticsearch doesn’t calculate the IDF across all documents in the index. Instead, each shard calculates a local IDF for the documents contained in that shard.

Because our documents are well distributed, the IDF for both shards will be the same. Now imagine instead that five of the foo documents are on shard 1, and the sixth document is on shard 2. In this scenario, the term foo is very common on one shard (and so of little importance), but rare on the other shard (and so much more important). These differences in IDF can produce incorrect results.

In practice, this is not a problem. The differences between local and global IDF diminish the more documents that you add to the index. With real-world volumes of data, the local IDFs soon even out. The problem is not that relevance is broken but that there is too little data.

For testing purposes, there are two ways we can work around this issue. The first is to create an index with one primary shard, as we did in the section introducing the match query. If you have only one shard, then the local IDF is the global IDF.

The second workaround is to add ?search_type=dfs_query_then_fetch to your search requests. The dfs stands for Distributed Frequency Search, and it tells Elasticsearch to first retrieve the local IDF from each shard in order to calculate the global IDF across the whole index.

For more information take a look here

这篇关于弹性搜索群集中相同查询的不同结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆