AWS DynamoDB 写后读一致性 - 它在理论上是如何工作的? [英] AWS DynamoDB read after write consistency - how does it work theoretically?

本文介绍了AWS DynamoDB 写后读一致性 - 它在理论上是如何工作的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大部分nosql方案只使用最终一致性,而DynamoDB将数据复制到三个数据中心,读写一致性是如何维护的?

Most of the nosql solution only use eventually consistency, and given that DynamoDB replicate the data into three datacenter, how does read after write consistency is being maintained?

解决此类问题的通用方法是什么?我认为这很有趣,因为即使在 MySQL 复制中,数据也是异步复制的.

What would be generic approach to this kind of problem? I think it is interesting since even in MySQL replication data is replicated asynchronously.

推荐答案

我将使用 MySQL 来说明答案,因为你提到了它,但是,很明显,我们都没有暗示 DynamoDB 在 MySQL 上运行.

I'll use MySQL to illustrate the answer, since you mentioned it, though, obviously, neither of us is implying that DynamoDB runs on MySQL.

在具有一个 MySQL 主服务器和任意数量的从服务器的单一网络中,答案似乎非常简单——为了最终的一致性,从随机选择的从服务器中获取答案;为保证写后读的一致性,请始终从主服务器获取答案.

In a single network with one MySQL master and any number of slaves, the answer seems extremely straightforward -- for eventual consistency, fetch the answer from a randomly-selected slave; for read-after-write consistency, always fetch the answer from the master.

即使在 MySQL 复制中,数据也是异步复制的

even in MySQL replication data is replicated asynchronously

该声明有一个重要的例外,我怀疑它很可能比这里的任何其他替代方案更接近 DynamoDB 的现实:在与 MySQL 兼容的 Galera 集群,masters之间的复制是同步的,因为masters在commit-time协作处理每个事务并且一个事务不能提交给所有的主人也会在它起源的主人身上引发错误.像这样的集群在技术上只能使用 2 个节点运行,但不应少于 3 个,因为当集群中出现分裂时,任何发现自己单独或在小于原始集群大小一半的组中的节点都会滚动将自己变成一个无害的小球并拒绝为查询提供服务,因为它知道自己处于孤立的少数群体中,并且不再可以信任其数据.所以三个 在这样的分布式环境中是一个神奇的数字,以避免灾难性的裂脑状况.

There's an important exception to that statement, and I suspect there's a good chance that it's closer to the reality of DynamoDB than any other alternative here: In a MySQL-compatible Galera cluster, replication among the masters is synchronous, because the masters collaborate on each transaction at commit-time and a transaction that can't be committed to all of the masters will also throw an error on the master where it originated. A cluster like this technically can operate with only 2 nodes, but should not have less than three, because when there is a split in the cluster, any node that finds itself alone or in a group smaller than half of the original cluster size will roll itself up into a harmless little ball and refuse to service queries, because it knows it's in an isolated minority and its data can no longer be trusted. So three is something of a magic number in a distributed environment like this, to avoid a catastrophic split-brain condition.

如果我们假设 DynamoDB 中的三个地理分布的副本"都是主"副本,它们可能会沿着与 Galera 相同的同步主线的逻辑运行,因此解决方案基本上是相同的,因为该设置还允许任何或所有主控仍然具有传统的对向异步从属使用 MySQL 本机复制.不同之处在于,如果您想要读写一致性,您可以从当前连接到集群的任何 master 中获取,因为它们都是同步的;否则从从站获取.

If we assume the "three geographically-distributed replicas" in DynamoDB are all "master" copies, they might operate with logic along same lines of synchronous masters like you'd find with Galera, so the solution would be essentially the same since that setup also allows any or all of the masters to still have conventional subtended asynchronous slaves using MySQL native replication. The difference there is that you could fetch from any of the masters that is currently connected to the cluster if you wanted read-after-write consistency, since all of them are in sync; otherwise fetch from a slave.

我能想到的第三种情况类似于循环复制配置中的三个地理上分散的 MySQL 主服务器,它再次支持每个主服务器的对向从属服务器,但存在主服务器不同步的额外问题,并且没有解决冲突的能力——对于这个应用来说根本不可行,但为了讨论的目的,如果每个对象"都有某种高度精确的时间戳,目标仍然可以实现.当需要写后读一致性时,这里的解决方案可能是为响应提供服务的系统轮询所有主设备以找到最新版本,直到所有主设备都被轮询后才返回答案,或者从从属设备读取为了最终的一致性.

The third scenario I can think of would be analogous to three geographically-dispersed MySQL masters in a circular replication configuration, which, again, supports subtended slaves off of each master, but has the additional problems that the masters are not synchronous and there is no conflict resolution capability -- not at all viable for this application, but for purposes of discussion, the objective could still be achieved if each "object" had some kind of highly-precise timestamp. When read-after-write consistency is needed, the solution here might be for the system serving the response to poll all of the masters to find the newest version, not returning an answer until all masters had been polled, or to read from a slave for eventual consistency.

本质上,如果有多个写主控",那么主控似乎别无选择,只能在提交时协作,或者在一致读取时协作.

Essentially, if there's more than one "write master" then it would seem like the masters have no choice but to either collaborate at commit-time, or collaborate at consistent-read-time.

我认为,有趣的是,尽管您可以在在线评论文章中找到一些关于 DynamoDB 中两个读取一致性级别之间定价差异的抱怨,但这种分析 - 即使与 Dy​​namoDB 内部的现实脱节,因为它是 - 似乎确实证明了这种差异.

Interestingly, I think, in spite of some whining you can find in online opinion pieces about the disparity in pricing among the two read-consistency levels in DynamoDB, this analysis -- even as divorced from the reality of DynamoDB's internals as it is -- does seem to justify that discrepancy.

最终一致的只读副本本质上是无限可扩展的(即使在 MySQL 中,一个主服务器可以轻松地为多个从服务器提供服务,每个从服务器也可以轻松地为自己的多个从服务器提供服务,每个从服务器可以服务多个... ad infinitum),但 read-after-write 并不是无限可扩展的,因为根据定义,它似乎需要更权威"的服务器的参与,无论这具体意味着什么,从而证明更高的读取价格是合理的需要这种级别的一致性.

Eventually-consistent read replicas are essentially infinitely scalable (even with MySQL, where a master can easily serve several slaves, each of which can also easily serve several slaves of its own, each of which can serve several... ad infinitum) but read-after-write is not infinitely scalable, since by definition it would seem to require the involvement of a "more-authoritative" server, whatever that specifically means, thus justifying a higher price for reads where that level of consistency is required.

这篇关于AWS DynamoDB 写后读一致性 - 它在理论上是如何工作的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆