在 ElasticSearch 中获取一致性(和仲裁) [英] GET Consistency (and Quorum) in ElasticSearch

查看:22
本文介绍了在 ElasticSearch 中获取一致性(和仲裁)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 ElasticSearch 的新手,我正在为一个项目评估它.

I am new to ElasticSearch and I am evaluating it for a project.

在 ES 中,复制可以是同步的或异步的.在异步的情况下,一旦将文档写入主分片,客户端就会成功返回.然后将文档异步推送到其他副本.

In ES, Replication can be sync or async. In case of async, the client is returned success as soon as the document is written to the primary shard. And then the document is pushed to other replicas asynchronously.

当异步写入时,我们如何确保当 GET 完成时,即使数据没有传播到所有副本,也会返回数据.因为当我们在 ES 中执行 GET 时,查询被转发到相应分片的副本之一.假设我们是异步写入的,主分片可能有文档,但执行 GET 的选定副本可能尚未接收/写入文档.在 Cassandra 中,我们可以在写入和读取时指定一致性级别(ONE、QUORUM、ALL).在 ES 中读取是否有可能?

When written asynchronously, how do we ensure that when GET is done, data is returned even if it has not propagated to all the replicas. Because when we do a GET in ES, the query is forwarded to one of the replicas of the appropriate shard. Provided we are writing asynchronously, the primary shard may have the document but the selected replica for doingthe GET may not have received/written the document yet. In Cassandra, we can specify consistency levels (ONE, QUORUM, ALL) at the time of writes as well as reads. Is something like that possible for reads in ES?

推荐答案

对了,可以设置复制是异步的(默认是同步的)不等待副本,尽管在实践中这不会给你带来太多收益.

Right, you can set replication to be async (default is sync) to not wait for the replicas, although in practice this doesn't buy you much.

每当您读取数据时,您都可以指定 首选项参数来控制文档将从何处获取.如果您使用 preference:_primary ,请确保始终从主分片获取文档,否则,如果在文档在所有副本上可用之前完成获取,则可能会遇到还没有的分片.鉴于 get api 是实时工作的,保持复制同步通常是有意义的,这样在索引操作返回后,您总是可以通过 id 从任何应该包含它的分片中取回文档.不过,如果您在第一次索引文档时尝试取回文档,那么您可能找不到它.

Whenever you read data you can specify the preference parameter to control where the documents are going to be taken from. If you use preference:_primary you make sure that you always take the document from the primary shard, otherwise, if the get is done before the document is available on all replicas, it might happen that you hit a shard that doesn't have it yet. Given that the get api works in real-time, it usually makes sense to keep replication sync, so that after the index operation returned you can always get back the document by id from any shard that is supposed to contain it. Still, if you try to get back a document while indexing it for the first time, well it can happen that you don't find it.

elasticsearch 中也有写一致性参数,但与其他数据存储的工作方式不同,与复制是同步还是异步无关.使用 consistency 参数,您可以控制副本的数量的数据需要可用,以便允许写入操作.如果没有足够的数据副本可用,则写入操作将失败(等待最多 1 分钟后,您可以通过 timeout 参数更改间隔).这只是决定是否接受操作的初步检查.这并不意味着如果副本上的操作失败,它将被回滚.实际上,如果在副本上的写入操作失败但在主节点上成功,则假设副本(或它正在运行的硬件)有问题,因此分片将被标记为失败并在另一个节点上重新创建.一致性的默认值为quorum,也可以设置为oneall.

There is a write consistency parameter in elasticsearch as well, but it is different compared to how other data storages work, and it is not related to whether replication is sync or async. With the consistency parameter you can control how many copies of the data need to be available in order for a write operation to be permissible. If not enough copies of the data are available the write operation will fail (after waiting for up to 1 minute, interval that you can change through the timeout parameter). This is just a preliminary check to decide whether to accept the operation or not. It doesn't mean that if the operation fails on a replica it will be rollbacked. In fact, if a write operation fails on a replica but succeeds on a primary, the assumption is that there is something wrong with the replica (or the hardward it's running on), thus the shard will be marked as failed and recreated on another node. Default value for consistency is quorum, and can also be set to one or all.

也就是说,当涉及到 get api 时,elasticsearch 并不是最终一致的,而是一致的,因为一旦文档被索引,您就可以检索它.

That said, when it comes to the get api, elasticsearch is not eventually consistent, but just consistent as once a document is indexed you can retrieve it.

新添加的文档在下一次刷新操作之前不可用于搜索,默认情况下每秒自动发生一次,这与最终一致性无关(因为文档在那里并且可以通过 id 检索),而是更多关于搜索和 lucene 的工作原理以及文档如何通过 lucene 可见.

The fact that newly added documents are not available for search till the next refresh operation, which happens every second automatically by default, is not really about eventual consistency (as the documents are there and can be retrieved by id), but more about how search and lucene work and how documents are made visible through lucene.

这篇关于在 ElasticSearch 中获取一致性(和仲裁)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆