同一查询的不同Elasticsearch结果 [英] Different Elasticsearch results for the same query
问题描述
我为Elasticsearch设置了1个群集á4个节点。
每个索引的分片数量:1;每个索引的副本数:3
I've setup Elasticsearch with 1 cluster á 4 nodes. Number of shards per index: 1; Number of replicas per index: 3
当我多次调用以下类似的简单查询时,我得到不同的结果(不同的总命中率和不同的前10个文档):
When I call a simple query like the following one multiple times I get different results (different total hits and different top 10 documents):
http://localhost:9200/index_name/_search?q=term
每个分片上的数据不同?我喜欢让所有碎片都保持最新状态。我该怎么办?
Different data on each shard? I like to have all shards up to date. What can I do?
这是/ _cluster / health的结果:
This is the result of /_cluster/health:
{
"cluster_name" : "secret",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 4,
"number_of_data_nodes" : 4,
"active_primary_shards" : 24,
"active_shards" : 96,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}
作为临时解决方案,我重建索引通过Ruby宝石轮胎:ModelName.rebuild_index
As a temporary solution I rebuild the index through Ruby gem tire: ModelName.rebuild_index
但是我需要一个长期解决方案。
But I need a long-term solution.
推荐答案
我们遇到了一个类似的问题,结果是因为在搜索时,Elasticsearch在不同分片之间进行轮询。由于ES _score 略有不同。 -of-deleted-documents rel = nofollow noreferrer>处理索引中的已删除文档。在我们的案例中,这意味着相似的结果通常会在结果顺序中稍低或较高,并且在与分页结合使用时(使用 from
和 size
在搜索查询中),这意味着相同的结果出现在两个单独的页面上或根本不在页面上出现。
We ran into a similar problem and it turned out to be because Elasticsearch round-robins between different shards when searching. Each shard returns a slightly different _score
because of slightly different indexing due to the way ES handles deleted documents in an index. In our case this meant similar results often placed slightly lower or higher in the results order, and, when combined with pagination (using from
and size
in the search query) it meant the same results were turning up on two separate "pages" or not at all from page to page.
我们发现有关一致性评分的文章这非常简洁,并实现了 preference
参数,以确保通过查询相同的分片始终为特定搜索获得相同的分数:
We found an Elasticsearch article on consistent scoring which explains this quite neatly and implemented a preference
parameter to ensure that we always get the same scores for a particular search by querying the same shards:
http://localhost:9200/index_name/_search?q=term&preference=blablabla
我们也考虑过使用排序,但是Elas ticsearch通过内部Lucene文档ID对具有相同分数的结果进行排序,以确保始终以相同顺序返回具有相同分数的结果。
We also thought about using sorting, but Elasticsearch sorts results with the same scores by an internal Lucene document ID, ensuring that results with the same scores are always returned in the same order.
这篇关于同一查询的不同Elasticsearch结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!