在AWS S3中创建对象的新版本是否最终保持一致或写入后读取保持一致? [英] Is creating a new version of an object in AWS S3 eventually consistent or read-after-write consistent?

查看:81
本文介绍了在AWS S3中创建对象的新版本是否最终保持一致或写入后读取保持一致?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从Amazon的文档中看到,向S3写入新对象是写后一致的,但是更新和删除操作最终是一致的.我想推一下启用了版本控制的对象的 新版本 最终将像更新一样保持一致,但是我找不到任何文档可以确认.有人知道吗?

我的问题是关于GET的行为,无论是否指定了显式版本.

我真的很想在我的项目的 更新 上执行写后读取行为,我可能只能模拟插入,但是这样做可能会更容易如果推送对象的新版本提供了所需的行为.

解决方案

您已经知道...

问:Amazon S3采用什么数据一致性模型?

所有区域中的Amazon S3存储桶为新对象的PUTS提供写后读取一致性,并为覆盖PUTS和DELETES提供最终一致性.

https://aws.amazon.com/s3/faqs/

...就一致性模型的正式声明而言,仅此而已.

但是,我建议可以从中以合理的确定性推断其余部分,以及我们可以合理做出的假设,以及对S3的内部运作方式的一些其他一般性见识.

例如,我们知道S3尚未真正将对象存储为分层结构:

Amazon S3在每个AWS区域中维护一个对象键名称的索引.对象键按字典顺序存储在索引中的多个分区中.

http://docs.aws. amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html

这意味着S3具有至少两个离散的主要组件,一个持久存储数据的后备存储,以及一个指向后备存储中位置的键的索引.我们还知道,然后两者都分布在多个可用区域中,因此它们都是可以复制的.

后备存储与索引分开的事实并不是一个定论,除非您记得可以基于每个对象选择存储类,这几乎必然意味着索引和数据是分开存储的.

基于覆盖PUT操作最终是一致的事实,我们可以得出结论,即使在非版本存储桶中,覆盖实际上也不是对后备存储的覆盖,而是索引条目的覆盖该对象的密钥,并最终释放索引不再引用的后备存储中的空间.

我在这些断言中看到的含义是索引已复制,并且覆盖后读取(或删除)有可能命中尚未反映最新覆盖的索引副本...但是当读取在其本地索引中遇到没有这样的键"条件时,系统将采用更多的资源密集型路径来询问主"索引(无论这在S3的体系结构中实际上意味着什么),以查看此类对象是否真的确实存在,但是本地索引副本根本还没有了解它.

由于新对象的第一个GET几乎没有复制到适当的本地索引副本,这几乎是罕见的情况,因此可以合理地预期S3的架构师为此提供了更高的发现"费用当系统中的某个节点认为这可能是其遇到的情况时,该操作可以改善用户体验.

从所有这些方面,我建议您将遇到的最可能的行为是这样的:

    覆盖PUT之后在版本化对象上没有versionId的
  • GET最终将是一致的,因为为读取请求提供服务的节点将不会遇到No Such Key条件,因此不会遵循我在上面推测了理论上成本较高的发现"模型.

  • 显式请求最新版本ID的
  • GET在覆盖PUT上将立即保持一致,因为读取节点可能会启动高成本策略来获取其索引是否反映了所有索引的上游确认.最新数据,尽管这里的条件当然是无此版本",而不是无此密钥".

我知道猜测不是您想要的,但是没有相反的书面证明或经验证据(或者也许是某些使人信服的真相),相反,我怀疑这是我们能找到的最接近的结果.根据有关S3平台的公开可用信息得出可靠的结论.

I see from Amazon's documentation that writing a new object to S3 is read-after-write consistent, but that update and delete operations are eventually consistent. I would guess that pushing a new version of an object with versioning turned on would be eventually consistent like an update, but I can't find any documentation to confirm. Does anyone know?

Edit: My question is regarding the behavior of a GET with or without an explicit version specified.

I'd really like read-after-write behavior on updates for my project, which I may be able to simulate doing inserts only, but it might be easier if pushing new versions of an object provided the desired behavior.

解决方案

As you already know...

Q: What data consistency model does Amazon S3 employ?

Amazon S3 buckets in all Regions provide read-after-write consistency for PUTS of new objects and eventual consistency for overwrite PUTS and DELETES.

https://aws.amazon.com/s3/faqs/

...and that's about all there is, as far as official statements on the consistency model.

However, I would suggest that the remainder can be extrapolated with a reasonable degree of certainty from this, along with assumptions we can reasonably make, plus some additional general insights into the inner workings of S3.

For example, we know that S3 does not actually store the objects in a hierarchical structure, yet:

Amazon S3 maintains an index of object key names in each AWS region. Object keys are stored lexicographically across multiple partitions in the index.

http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html

This implies that S3 has at least two discrete major components, a backing store where the data is persisted, and an index of keys pointing to locations in the backing store. We also know that both of then are distributed across multiple availability zones and thus both of them are replicated.

The fact that the backing store is separate from the index is not a foregone conclusion until you remember that storage classes are selectable on a per-object basis, which almost necessarily means that the index and the data are stored separately.

From the fact that overwrite PUT operations are eventually-consistent, we can conclude that even in a non-versioned bucket, an overwrite is not in fact an overwrite of the backing store, but rather an overwrite of the index entry for that object's key, and an eventual freeing of the space in the backing store that's no longer referenced by the index.

The implication I see in these assertions is that the indexes are replicated and it's possible for a read-after-overwrite (or delete) to hit a replica of the index that does not yet reflect the most recent overwrite... but when a read encounters a "no such key" condition in its local index, the system pursues more resource-intensive path of interrogating the "master" index (whatever that may actually mean in the architecture of S3) to see if such an object really does exist, but the local index replica simply hasn't learned of it yet.

Since the first GET of a new object that has not replicated to the appropriate local index replica is almost certainly a rare occurrence, it is reasonable to expect that the architects of S3 made this allowance for a higher cost "discovery" operation to improve the user experience, when a node in the system believes this may be the condition it is encountering.

From all of this, I would suggest that the most likely behavior you would experience would be this:

  • GET without a versionId on a versioned object after an overwrite PUT would be eventually-consistent, since the node servicing the read request would not encounter the No Such Key condition, and would therefore not follow the theoretical higher-cost "discovery" model I speculated above.

  • GET with an explicit request for the newest versionId would be immediately consistent on an overwrite PUT, since the reading node would likely launch the high-cost strategy to obtain upstream confirmation of whether its index reflected all the most-current data, although of course the condition here would be No Such Version, rather than No Such Key.

I know speculation is not what you were hoping for, but absent documented confirmation or empirical (or maybe some really convincing anecdotal) evidence to the contrary, I suspect this is the closest we can come to drawing credible conclusions based on the publicly-available information about the S3 platform.

这篇关于在AWS S3中创建对象的新版本是否最终保持一致或写入后读取保持一致?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆