Elasticsearch读写一致性 [英] Elasticsearch read and write consistency

查看:383
本文介绍了Elasticsearch读写一致性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Elasticsearch没有读取一致性"参数(例如Cassandra). 但是它具有"写入一致性"和"阅读偏好"

Elasticsearch doesn't have "read consistency" param (like Cassandra). But it has "write consistency" and "read preference".

文档说了以下有关写的内容一致性

写一致性
为了防止在网络分区的错误"侧进行写操作,默认情况下,只有在仲裁的仲裁(> replicas/2 + 1)可用时,索引操作才会成功.可以使用action.write_consistency设置逐个覆盖此默认设置.要更改每次操作的行为,可以使用一致性请求参数.

Write Consistency
To prevent writes from taking place on the "wrong" side of a network partition, by default, index operations only succeed if a quorum (>replicas/2+1) of active shards are available. This default can be overridden on a node-by-node basis using the action.write_consistency setting. To alter this behavior per-operation, the consistency request parameter can be used.

有效的写一致性值是1,法定人数和全部.

Valid write consistency values are one, quorum, and all.

请注意,对于副本数为1(数据的2个副本的总数)的情况,则默认行为是如果1个副本(主副本)可以执行写操作,则成功.

Note, for the case where the number of replicas is 1 (total of 2 copies of the data), then the default behavior is to succeed if 1 copy (the primary) can perform the write.

仅在复制组中的所有活动分片为文档建立索引(同步复制)之后,索引操作才会返回.

The index operation only returns after all active shards within the replication group have indexed the document (sync replication).

我的问题是关于最后一段:

My question is about the last paragraph:

仅在复制组中的所有活动分片为文档建立索引(同步复制)之后,索引操作才会返回.

The index operation only returns after all active shards within the replication group have indexed the document (sync replication).

如果write_consistency=quorum(默认)并且所有分片都处于活动状态(无节点故障,无网络分区),则:
1)索引操作是否在仲裁定数后立即返回 分片已完成索引编制? (即使所有分片都处于活动状态/活动状态)
2)还是在所有活动/活动分片都已完成索引后返回索引操作? (即仅在失败/超时的情况下才考虑仲裁)

If write_consistency=quorum (default) and all shards are live (no node failures, no network-partition), then:
1) Does index operation return as soon as quorum of shards have finished indexing? (even though all shards are live/active)
2) Or does index operation return when all live/active shards have finished indexing? (i.e. quorum is considered only in case of failures/timeouts)

在第一种情况下-读取可能会最终保持一致(可能会获取陈旧的数据),写入会更快.
在第二种情况下-读取是一致的(只要没有网络分区),写入则较慢(因为它等待较慢的分片/节点).

In the first case - read may be eventual-consistent (may get stale data), write is quicker.
In the second case - read is consistent (as long as there are no network-partitions), write is slower (as it waits for the slower shard/node).

有人知道它是如何工作的吗?

Does anyone know how it works?

我想知道的另一件事-为什么'首选项'参数(在获取/搜索请求中)是randomized,但不是_local(我想应该是更有效的)

Another thing that I wonder about - is why the default value for 'preference' param (in get/search request) is randomized but not _local (which must have been more efficient I suppose)

推荐答案

我认为我现在可以回答自己的问题了:)

I think I can answer my own question now :)

关于第一个问题,请重新阅读文档( this )几次:)我意识到这句话应该是正确的:

Regarding the first question, by re-re-reading the documentation (this and this) a few times :) I realized that this statement should be right:

所有活动/活动分片都已完成索引编制时,无论一致性参数如何,索引编制操作都会返回.一致性参数只能在没有足够的可用分片(节点)的情况下阻止操作开始.

Index operation return when all live/active shards have finished indexing, regardless of consistency param. Consistency param may only prevent the operation to start if there are not enough available shards(nodes).

例如,如果有3个分片(一个主副本和两个副本),并且所有分片都可用-操作将等待所有3个(考虑到所有3个都处于活动状态/可用),不考虑一致性参数(即使consistency=one)
这使系统保持一致(至少是document-api部分);除非存在网络分区. 但是,我还没有机会进行测试.

So for example, if there are 3 shards (one primary and two replicas), and all shards are available - the operation will be waiting for all 3 (considering that all 3 are live/available), regardless of consistency param (even when consistency=one)
This makes the system consistent (at least the document-api part); unless there is a network-partition. But, I didn't have a chance to test this yet.

更新:这里的一致性并不是我要说的ACID一致性,它只是保证在返回请求时更新所有副本的方法.

UPDATE: by consistency here, I don't mean ACID-consistency, it is just the guarantee that all replicas are updated at the moment when request is returned.

关于第二个问题: 显而易见的答案是-分散负载为randomized;另一方面,客户端可以选择一个随机节点与之对话,但可能效率不是100%,因为单个请求可能需要多个分片.

Regarding the second question: The obvious answer is - it is randomized to spread the load; on the other hand, a client can pick a random node to talk to, but probably it is not 100% efficient as a single request may need multiple shards.

这篇关于Elasticsearch读写一致性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆