Elasticsearch 读写一致性 [英] Elasticsearch read and write consistency

查看:57
本文介绍了Elasticsearch 读写一致性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Elasticsearch 没有读取一致性"参数(如 Cassandra).但它具有写入一致性" 和 "阅读偏好".

Elasticsearch doesn't have "read consistency" param (like Cassandra). But it has "write consistency" and "read preference".

文档说明了以下关于 Write一致性

写入一致性
为了防止在网络分区的错误"一侧发生写入,默认情况下,索引操作仅在活动分片的仲裁 (>replicas/2+1) 可用时才会成功.可以使用 action.write_consistency 设置逐个节点地覆盖此默认值.要更改每个操作的这种行为,可以使用一致性请求参数.

Write Consistency
To prevent writes from taking place on the "wrong" side of a network partition, by default, index operations only succeed if a quorum (>replicas/2+1) of active shards are available. This default can be overridden on a node-by-node basis using the action.write_consistency setting. To alter this behavior per-operation, the consistency request parameter can be used.

有效的写入一致性值为一、仲裁和全部.

Valid write consistency values are one, quorum, and all.

注意,对于副本数为1(共2个数据副本)的情况,如果1个副本(主)可以执行写入,则默认行为是成功.

Note, for the case where the number of replicas is 1 (total of 2 copies of the data), then the default behavior is to succeed if 1 copy (the primary) can perform the write.

索引操作仅在复制组内的所有 active 分片都已索引文档(同步复制)后返回.

The index operation only returns after all active shards within the replication group have indexed the document (sync replication).

我的问题是关于最后一段:

My question is about the last paragraph:

索引操作仅在复制组内的所有 active 分片都已索引文档(同步复制)后返回.

The index operation only returns after all active shards within the replication group have indexed the document (sync replication).

如果 write_consistency=quorum(默认)并且所有分片都处于活动状态(无节点故障,无网络分区),则:
1) 索引操作是否在法定人数后立即返回分片已完成索引?(即使所有分片都处于活动状态)
2) 或者当所有活动/活动分片都完成索引时索引操作是否返回?(即仲裁仅在出现故障/超时的情况下才考虑)

If write_consistency=quorum (default) and all shards are live (no node failures, no network-partition), then:
1) Does index operation return as soon as quorum of shards have finished indexing? (even though all shards are live/active)
2) Or does index operation return when all live/active shards have finished indexing? (i.e. quorum is considered only in case of failures/timeouts)

在第一种情况下 - 读取可能是最终一致的(可能会得到陈旧的数据),写入速度更快.
在第二种情况下 - 读取是一致的(只要没有网络分区),写入速度较慢(因为它等待较慢的分片/节点).

In the first case - read may be eventual-consistent (may get stale data), write is quicker.
In the second case - read is consistent (as long as there are no network-partitions), write is slower (as it waits for the slower shard/node).

有人知道它是如何工作的吗?

Does anyone know how it works?

我想知道的另一件事是为什么preference' 参数(在获取/搜索请求中)是 randomized 但不是 _local(我想这一定更有效)

Another thing that I wonder about - is why the default value for 'preference' param (in get/search request) is randomized but not _local (which must have been more efficient I suppose)

推荐答案

我想我现在可以回答我自己的问题了 :)

I think I can answer my own question now :)

关于第一个问题,通过重新阅读文档(thisthis) 几次 :) 我意识到这句话应该是正确的:

Regarding the first question, by re-re-reading the documentation (this and this) a few times :) I realized that this statement should be right:

当所有活动/活动分片完成索引时,索引操作返回,无论一致性参数如何.如果没有足够的可用分片(节点),一致性参数可能只会阻止操作启动.

Index operation return when all live/active shards have finished indexing, regardless of consistency param. Consistency param may only prevent the operation to start if there are not enough available shards(nodes).

例如,如果有 3 个分片(一个主分片和两个副本),并且所有分片都可用 - 操作将等待 所有 3(考虑到所有 3 个分片都处于活动状态/可用),不管一致性参数(即使consistency=one)
这使系统保持一致(至少是document-api部分);除非有网络分区.但是,我还没有机会对此进行测试.

So for example, if there are 3 shards (one primary and two replicas), and all shards are available - the operation will be waiting for all 3 (considering that all 3 are live/available), regardless of consistency param (even when consistency=one)
This makes the system consistent (at least the document-api part); unless there is a network-partition. But, I didn't have a chance to test this yet.

UPDATE:这里所说的一致性,不是指 ACID 一致性,它只是保证在请求返回的那一刻更新所有副本.

UPDATE: by consistency here, I don't mean ACID-consistency, it is just the guarantee that all replicas are updated at the moment when request is returned.

关于第二个问题:显而易见的答案是 - 分散负载是随机化;另一方面,客户端可以选择一个随机节点与之对话,但可能不是 100% 有效,因为单个请求可能需要多个分片.

Regarding the second question: The obvious answer is - it is randomized to spread the load; on the other hand, a client can pick a random node to talk to, but probably it is not 100% efficient as a single request may need multiple shards.

这篇关于Elasticsearch 读写一致性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆