Elasticsearch 读写一致性 [英] Elasticsearch read and write consistency
问题描述
Elasticsearch 没有读取一致性"参数(如 Cassandra).但它具有写入一致性" 和 "阅读偏好".
Elasticsearch doesn't have "read consistency" param (like Cassandra). But it has "write consistency" and "read preference".
文档说明了以下关于 Write一致性
写入一致性
为了防止在网络分区的错误"一侧发生写入,默认情况下,索引操作仅在活动分片的仲裁 (>replicas/2+1) 可用时才会成功.可以使用 action.write_consistency 设置逐个节点地覆盖此默认值.要更改每个操作的这种行为,可以使用一致性请求参数.
Write Consistency
To prevent writes from taking place on the "wrong" side of a network partition, by default, index operations only succeed if a quorum (>replicas/2+1) of active shards are available. This default can be overridden on a node-by-node basis using the action.write_consistency setting. To alter this behavior per-operation, the consistency request parameter can be used.
有效的写入一致性值为一、仲裁和全部.
Valid write consistency values are one, quorum, and all.
注意,对于副本数为1(共2个数据副本)的情况,如果1个副本(主)可以执行写入,则默认行为是成功.
Note, for the case where the number of replicas is 1 (total of 2 copies of the data), then the default behavior is to succeed if 1 copy (the primary) can perform the write.
索引操作仅在复制组内的所有 active 分片都已索引文档(同步复制)后返回.
The index operation only returns after all active shards within the replication group have indexed the document (sync replication).
我的问题是关于最后一段:
My question is about the last paragraph:
索引操作仅在复制组内的所有 active 分片都已索引文档(同步复制)后返回.
The index operation only returns after all active shards within the replication group have indexed the document (sync replication).
如果 write_consistency=quorum
(默认)并且所有分片都处于活动状态(无节点故障,无网络分区),则:
1) 索引操作是否在法定人数后立即返回分片已完成索引?(即使所有分片都处于活动状态)
2) 或者当所有活动/活动分片都完成索引时索引操作是否返回?(即仲裁仅在出现故障/超时的情况下才考虑)
If write_consistency=quorum
(default) and all shards are live (no node failures, no network-partition), then:
1) Does index operation return as soon as quorum of
shards have finished indexing? (even though all shards are live/active)
2) Or does index operation return when all live/active shards have finished indexing? (i.e. quorum is considered only in case of failures/timeouts)
在第一种情况下 - 读取可能是最终一致的(可能会得到陈旧的数据),写入速度更快.
在第二种情况下 - 读取是一致的(只要没有网络分区),写入速度较慢(因为它等待较慢的分片/节点).
In the first case - read may be eventual-consistent (may get stale data), write is quicker.
In the second case - read is consistent (as long as there are no network-partitions), write is slower (as it waits for the slower shard/node).
有人知道它是如何工作的吗?
Does anyone know how it works?
我想知道的另一件事是为什么preference' 参数(在获取/搜索请求中)是 randomized
但不是 _local
(我想这一定更有效)
Another thing that I wonder about - is why the default value for 'preference' param (in get/search request) is randomized
but not _local
(which must have been more efficient I suppose)
推荐答案
我想我现在可以回答我自己的问题了 :)
I think I can answer my own question now :)
关于第一个问题,通过重新阅读文档(this 和 this) 几次 :) 我意识到这句话应该是正确的:
Regarding the first question, by re-re-reading the documentation (this and this) a few times :) I realized that this statement should be right:
当所有活动/活动分片完成索引时,索引操作返回,无论一致性参数如何.如果没有足够的可用分片(节点),一致性参数可能只会阻止操作启动.
Index operation return when all live/active shards have finished indexing, regardless of consistency param. Consistency param may only prevent the operation to start if there are not enough available shards(nodes).
例如,如果有 3 个分片(一个主分片和两个副本),并且所有分片都可用 - 操作将等待 所有 3(考虑到所有 3 个分片都处于活动状态/可用),不管一致性参数(即使consistency=one
)
这使系统保持一致(至少是document-api部分);除非有网络分区.但是,我还没有机会对此进行测试.
So for example, if there are 3 shards (one primary and two replicas), and all shards are available - the operation will be waiting for all 3 (considering that all 3 are live/available), regardless of consistency param (even when consistency=one
)
This makes the system consistent (at least the document-api part); unless there is a network-partition.
But, I didn't have a chance to test this yet.
UPDATE:这里所说的一致性,不是指 ACID 一致性,它只是保证在请求返回的那一刻更新所有副本.
UPDATE: by consistency here, I don't mean ACID-consistency, it is just the guarantee that all replicas are updated at the moment when request is returned.
关于第二个问题:显而易见的答案是 - 分散负载是随机化
;另一方面,客户端可以选择一个随机节点与之对话,但可能不是 100% 有效,因为单个请求可能需要多个分片.
Regarding the second question:
The obvious answer is - it is randomized
to spread the load; on the other hand, a client can pick a random node to talk to, but probably it is not 100% efficient as a single request may need multiple shards.
这篇关于Elasticsearch 读写一致性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!