文件上传的最长 Amazon S3 复制时间是多少? [英] What is maximum Amazon S3 replication time on file upload?

查看:31
本文介绍了文件上传的最长 Amazon S3 复制时间是多少?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们在项目中使用 Amazon S3 作为客户端上传文件的存储.

We use Amazon S3 in our project as a storage for files uploaded by clients.

出于技术原因,我们使用临时名称将文件上传到 S3,然后处理其内容并在处理后重命名文件.

For technical reasons, we upload a file to S3 with a temporary name, then process its contents and rename the file after it has been processed.

'rename' 操作一次又一次地失败,出现 404(未找到密钥) 错误,尽管被重命名的文件已成功上传.

The 'rename' operation fails time after time with 404 (key not found) error, although the file being renamed had been uploaded successfully.

亚马逊文档提到这个问题:

Amazon S3 通过在 Amazon 数据中心内的多个服务器之间复制数据来实现高可用性.如果 PUT 请求成功,则您的数据将被安全存储.但是,有关更改的信息必须跨 Amazon S3 复制,这可能需要一些时间,因此您可能会观察到以下行为:

Amazon S3 achieves high availability by replicating data across multiple servers within Amazon's data centers. If a PUT request is successful, your data is safely stored. However, information about the changes must replicate across Amazon S3, which can take some time, and so you might observe the following behaviors:

我们实施了一种轮询作为解决方法:重试重命名"操作直到成功.
投票在 20 秒后停止.

We implemented a kind of polling as workaround: retry the 'rename' operation until it succeeds.
The polling stops after 20 seconds.

此解决方法在大多数情况下都有效:文件会在几秒钟内复制.
但是有时——很少——20秒是不够的;S3 中的复制需要更多时间.

This workaround works in most cases: the file gets replicated within few seconds.
But sometimes — very rarely — 20 seconds are not enough; the replication in S3 takes more time.

  • 在 Amazon S3 上成功执行 PUT 操作和完成复制之间的最长时间您观察到是多少?

  • What is the maximum time you observed between a successful PUT operation and complete replication on Amazon S3?

Amazon S3 是否提供绕过"复制的方法?(直接查询master"?)

Does Amazon S3 offer a way to 'bypass' replication? (Query 'master' directly?)

推荐答案

更新:这个答案使用了一些较旧的术语,我在很大程度上保留了这些术语.AWS 更改了US-Standard"的友好名称以与其他区域的命名更加一致,但其 区域端点 for IPv4 仍然具有不寻常的名称 s3-external-1.amazonaws.com.

Update: this answer uses some older terminology, which i have left in place, for the most part. AWS has changed the friendly name of "US-Standard" to be more consistent with the naming of other regions, but its regional endpoint for IPv4 still has the unusual name s3-external-1.amazonaws.com.

S3 的 us-east-1 区域有一个 IPv4/IPv6双栈"端点,它遵循 s3.dualstack.us-east-1.amazonaws.com 的标准约定,并且如果您启用了 IPv6,此端点在操作上似乎等效于 s3-external-1,如下所述.

The us-east-1 region of S3 has an IPv4/IPv6 "dual stack" endpoint that follows the standard convention of s3.dualstack.us-east-1.amazonaws.com and if you are IPv6 enabled, this endpoint seems operationally-equivalent to s3-external-1 as discussed below.

有关该地区请求的地理路由记录的参考文献似乎已基本消失,没有太多评论,但轶事证据表明以下信息仍然与该地区相关.

The documented references to geographic routing of requests for this region seem to have largely disappeared, without much comment, but anecdotal evidence suggests that the following information is still relevant to that region.

问.没有美国标准区域吗?

我们将美国标准区域重命名为美国东部(弗吉尼亚北部)区域,以与 AWS 区域命名约定保持一致.

We renamed the US Standard Region to US East (Northern Virginia) Region to be consistent with AWS regional naming conventions.

https://aws.amazon.com/s3/faqs/#regions>

使用 S3 传输加速功能的存储桶使用 ${bucketname}.s3-accelerate.amazonaws.com 的全局样式端点,目前尚不清楚该端点对我们的行为方式-east-1 存储桶和最终一致性,尽管其他区域不应该受此功能影响(如果启用)是理所当然的.此功能通过将请求路由到相同的 S3 端点但通过 AWS边缘网络"(为 CloudFront 提供支持的同一系统)进行代理,为距离存储桶更远的用户提高了传输吞吐量.它本质上是一个通过 CloudFront 的自配置路径,但没有启用缓存.加速来自优化的网络堆栈,并将流量保持在托管 AWS 网络上的大部分 Internet 路径上.因此,如果您在存储桶上启用并使用此功能,此功能应该不会对一致性产生影响……但是,正如我所提到的,尚不清楚它如何与 us-east-1 存储桶交互.

Buckets using the S3 Transfer Acceleration feature use a global-style endpoint of ${bucketname}.s3-accelerate.amazonaws.com and it is not yet evident how this endpoint behaves with regard to us-east-1 buckets and eventual consistency, though it stands to reason that other regions should not be affected by this feature, if enabled. This feature improves transfer throughput for users who are more distant from the bucket by routing requests to the same S3 endpoints but proxying through the AWS "Edge Network," the same system that powers CloudFront. It is, essentially, a self-configuring path through CloudFront but without caching enabled. The acceleration comes from optimized network stacks and keeping the traffic on the managed AWS network for much of its path across the Internet. As such, this feature should have no impact on consistency, if you enable and use it on a bucket... but, as I mentioned, how it interacts with us-east-1 buckets is not yet known.

美国标准 (us-east-1) 区域是 S3 中最古老的区域,也可能是最大的区域,并且与其他较新的区域确实遵循一些不同的规则.

The US-Standard (us-east-1) region is the oldest, and presumably largest, region of S3, and does play by some different rules than the other, newer regions.

一个重要且相关的区别是一致性模型.

An important and relevant difference is the consistency model.

[除美国标准外的所有区域] 中的 Amazon S3 存储桶为新对象的 PUTS 提供写后读一致性,并为覆盖 PUTS 和 DELETES 提供最终一致性.美国标准区域中的 Amazon S3 存储桶提供最终一致性.

Amazon S3 buckets in [all regions except US Standard] provide read-after-write consistency for PUTS of new objects and eventual consistency for overwrite PUTS and DELETES. Amazon S3 buckets in the US Standard region provide eventual consistency.

http://aws.amazon.com/s3/faqs/

这就是我假设您使用的是美国标准的原因.您描述的行为与该设计约束一致.

This is why I assumed you were using US Standard. The behavior you described is consistent with that design constraint.

您应该能够验证在另一个区域的测试存储桶不会发生这种情况……但是,因为在同一区域内从 EC2 到 S3 的数据传输是免费的且延迟非常低,所以在一个存储桶中使用一个存储桶不同地区可能不太实用.

You should be able to verify that this doesn't happen with a test bucket in another region... but, because data transfer from EC2 to S3 within the same region is free and very low latency, using a bucket in a different region may not be practical.

还有一个值得尝试的选择,它与美国标准的内部运作有关.

There is another option that is worth trying, has to do with the inner-workings of US-Standard.

美国标准实际上分布在弗吉尼亚和俄勒冈之间,对s3.amazonaws.com"的请求通过 DNS 有选择地路由到一个或另一个位置.这种路由很大程度上是一个黑匣子,但亚马逊已经公开了一个解决方法.

US Standard is in fact geographically-distributed between Virginia and Oregon, and requests to "s3.amazonaws.com" are selectively routed via DNS to one location or another. This routing is largely a black box, but Amazon has exposed a workaround.

您可以通过将端点从s3.amazonaws.com"更改为s3-external-1.amazonaws.com"来强制您的请求仅路由到北弗吉尼亚州...

You can force your requests to be routed only to Northern Virginia by changing your endpoint from "s3.amazonaws.com" to "s3-external-1.amazonaws.com" ...

http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region

...这是我的推测,但是您的请求的地理路由可能会加剧您的问题,并迫使它们s3-external-1"(需要明确的是,这仍然是美国- 标准),可能会改善或消除您的问题.

更新:上述建议已正式超越猜测,但我将其留作历史参考.大约一年我写了上面,亚马逊确实宣布美国标准确实在新对象创建时提供写后读一致性,但s3-external-1 使用端点.他们将其解释为好像这是一种新行为,情况可能确实如此……但这也可能只是平台正式支持的行为的改变.无论哪种方式:

Update: The advice above has officially risen above speculation, but I'll leave it for historical reference. About a year I wrote the above, Amazon indeed announced that US-Standard does offer read-after-write consistency on new object creation, but only when the s3-external-1 endpoint is used. They explain it as though it's a new behavior, and that may be the case... but it also may simply be a change in the behavior the platform officially supports. Either way:

从 [2015-06-19] 开始,美国标准区域现在支持使用北弗吉尼亚终端节点 (s3-external-1.amazonaws.com) 添加到 Amazon S3 的新对象的先写后读一致性.通过此更改,所有 Amazon S3 区域现在都支持先写后读一致性.写后读一致性允许您在 Amazon S3 中创建对象后立即检索对象.在此更改之前,美国标准区域中的 Amazon S3 存储桶为新创建的对象提供了最终一致性,这意味着在新对象上传后,一些小对象集可能无法立即读取.这些偶尔的延迟可能会使数据处理工作流程复杂化,其中应用程序需要在创建对象后立即读取对象.请注意,在美国标准区域,此一致性更改适用于北弗吉尼亚端点 (s3-external-1.amazonaws.com).使用全球终端节点 (s3.amazonaws.com) 的客户应改用北弗吉尼亚终端节点 (s3-external-1.amazonaws.com),以便在美国标准区域利用这种先读后写一致性的优势. [强调]

Starting [2015-06-19], the US Standard Region now supports read-after-write consistency for new objects added to Amazon S3 using the Northern Virginia endpoint (s3-external-1.amazonaws.com). With this change, all Amazon S3 Regions now support read-after-write consistency. Read-after-write consistency allows you to retrieve objects immediately after creation in Amazon S3. Prior to this change, Amazon S3 buckets in the US Standard Region provided eventual consistency for newly created objects, which meant that some small set of objects might not have been available to read immediately after new object upload. These occasional delays could complicate data processing workflows where applications need to read objects immediately after creating the objects. Please note that in US Standard Region, this consistency change applies to the Northern Virginia endpoint (s3-external-1.amazonaws.com). Customers using the global endpoint (s3.amazonaws.com) should switch to using the Northern Virginia endpoint (s3-external-1.amazonaws.com) in order to leverage the benefits of this read-after-write consistency in the US Standard Region. [emphasis added]

https://forums.aws.amazon.com/ann.jspa?安ID=3112

如果您要上传大量文件(每秒数百个),您可能还会对 S3 的分片机制感到不知所措.对于非常高的每秒上传次数,重要的是您的键(文件名")在词法上不连续.

If you are uploading a large number of files (hundreds per second), you might also be overwhelming S3's sharding mechanism. For very high numbers of uploads per second, it's important that your keys ("filenames") not be lexically sequential.

根据 Amazon 处理 DNS 的方式,如果您的代码可以处理它,您可能还想尝试另一种寻址存储桶的替代变体.

Depending on how Amazon handles DNS, you may also want to try another alternate variant of addressing your bucket if your code can handle it.

美国标准中的存储桶可以通过 http://mybucket.s3.amazonaws.com/解决密钥 ...或http://s3.amazonaws.com/mybucket/key ... 这两者的内部实现至少在理论上可能会有所不同,以某种方式改变与您的问题相关的行为.

Buckets in US-Standard can be addressed either with http://mybucket.s3.amazonaws.com/key ... or http://s3.amazonaws.com/mybucket/key ... and the internal implementation of these two could, at least in theory, be different in a way that changes the behavior in a way that would be relevant to your issue.

这篇关于文件上传的最长 Amazon S3 复制时间是多少?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆