在更新文件时从AWS S3下载 [英] downloading from AWS S3 while file is being updated

查看:157
本文介绍了在更新文件时从AWS S3下载的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这似乎是一个非常基本的问题,但是如果我正在从S3下载文件,而该文件正在由另一个进程进行更新,那么我是否需要担心会得到不完整的文件?

This may seem like a really basic question, but if I am downloading a file from S3 while it is being updated by another process, do I have to worry about getting an incomplete file?

示例:一个200MB的CSV文件.用户A开始以1Mbps的速度使用200MB的新内容更新文件. 16秒后,用户B开始以200Mbps的速度下载文件.用户B会获得全部200MB的原始文件,还是用户B会获得〜2MB的用户A的更改,而没有其他东西?

Example: a 200MB CSV file. User A starts to update the file with 200MB of new content at 1Mbps. 16 seconds later, User B starts download the file at 200Mbps. Does User B get all 200MB of the original file, or does User B get ~2MB of User A's changes and nothing else?

推荐答案

用户B获得了全部200MB的原始文件.

User B gets all 200MB of the original file.

这是为什么:

PUT操作是原子的.从技术上讲,没有修改"对象之类的东西.覆盖对象时实际发生的情况是,用另一个具有相同键的对象替换了该对象.但是原始对象实际上并没有被替换,直到新的(覆盖)对象全部上载并且成功上传为止……即使如此,覆盖的对象在技术上还没有消失"-只是在存储桶的索引中被替换了,以便将来的请求将被提供给新对象.

PUT operations on S3 are atomic. There's technically no such thing as "modifying" an object. What actually happens when an object is overwritten is that the object is replaced with another object having the same key. But the original object is not actually replaced until the new (overwriting) object is uploaded in its entirety, and successfully...and even then, the overwritten object is not technically "gone" yet -- it's only been replaced in the bucket's index, so that future requests will be served the new object.

(实际上,为新对象提供服务并不能保证总是立即发生.与可以立即下载的新对象上载相比,现有对象的覆盖最终是一致的, ,意味着,在上传对象后的很短时间内,仍然有可能(但是不太可能)仍然可以将旧副本提供给后续请求).

(Serving the new object is actually documented as not being guaranteed to always happen immediately. In contrast with uploads of new objects, which are immediately available for download, overwrites of existing objects are eventually consistent, meaning that it's possible -- however unlikely -- that for a short period of time after you upload an object that the old copy could still be served up for subsequent requests).

但是,当您覆盖一个对象,并且在存储桶上未启用版本控制时,尽管有相同的键,但旧对象和新对象实际上是独立存储在S3中的.现在,存储区的索引不再引用该旧对象,因此您不再需要为存储该存储区开帐单,并且不久之后就会从S3的后备存储中清除该对象.尚未实际记录到此事会在多长时间后发生……但是(tl; dr)覆盖当前正在下载的对象不会引起任何意外的副作用.

But when you overwrite an object, and versioning is not enabled on the bucket, the old object and new objects are actually stored independently in S3, in spite of the same key. The old object is now no longer referenced by the bucket's index, so you are no longer billed for storage of it, and it will shortly be purged from S3's backing store. It's not actually documented how much later this happens... but (tl;dr) overwriting an object that is currently being downloaded should not cause any unexpected side effects.

对单个键的更新是原子的.例如,如果您将PUT放置到现有密钥上,则后续的读取可能会返回旧数据或更新后的数据,但绝不会写入损坏或部分的数据.

Updates to a single key are atomic. For example, if you PUT to an existing key, a subsequent read might return the old data or the updated data, but it will never write corrupted or partial data.

http://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel

这篇关于在更新文件时从AWS S3下载的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆