什么是最大的Amazon S3复制时间对文件上传? [英] What is maximum Amazon S3 replication time on file upload?

查看:251
本文介绍了什么是最大的Amazon S3复制时间对文件上传?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们使用的亚马逊S3 的在我们的项目为存储客户端上传的文件。

We use Amazon S3 in our project as a storage for files uploaded by clients.

有关技术方面的原因,我们的上传文件到S3与临时名称的,然后再处理它的内容和重命名文件的已处理后。

For technical reasons, we upload a file to S3 with a temporary name, then process its contents and rename the file after it has been processed.

重命名操作失败时的时间后与 404(钥匙未找到)错误,但被重命名的文件已被上传成功。

The 'rename' operation fails time after time with 404 (key not found) error, although the file being renamed had been uploaded successfully.

<一个href="http://api-portal.anypoint.mulesoft.com/amazon/api/amazon-s3-api/docs/concepts#DataConsistencyModel">Amazon文档提到这个问题:

亚马逊S3亚马逊的数据中心中复制在多个服务器上的数据实现高可用性。   如果PUT请求是成功的,您的数据安全地存储。但是,有关更改的信息必须在整个复制Amazon S3的,可以采取的一段时间,所以你可能会出现以下现象:

Amazon S3 achieves high availability by replicating data across multiple servers within Amazon's data centers. If a PUT request is successful, your data is safely stored. However, information about the changes must replicate across Amazon S3, which can take some time, and so you might observe the following behaviors:

我们实现了一种的轮询作为解决方法的:重试重命名操作,直到成功为止。
20秒后停止投票。

We implemented a kind of polling as workaround: retry the 'rename' operation until it succeeds.
The polling stops after 20 seconds.

这个解决方法适用于大多数情况:该文件被几秒钟内复制。
但是的有时的 - 很少 - 的20秒是不够的;在S3复制花费更多的时间。

This workaround works in most cases: the file gets replicated within few seconds.
But sometimes — very rarely — 20 seconds are not enough; the replication in S3 takes more time.

  • 什么是最大的时间的你观察的Amazon S3上一个成功的PUT操作并完成复制的?

  • What is the maximum time you observed between a successful PUT operation and complete replication on Amazon S3?

请问亚马逊S3提供了一种方法来'搭桥'的复制?(查询主直接?)

推荐答案

美标(美国东部-1)区域是最古老,和S3的presumably大,地区,并通过宋不同的玩法规则比其他,较新的地区。

The US-Standard (us-east-1) region is the oldest, and presumably largest, region of S3, and does play by sone different rules than the other, newer regions.

这是重要和相关的区别是一致性模型。

An important and relevant difference is the consistency model.

Amazon S3的桶中[除美国标准各地区]提供读取后写新对象和覆盖PUTS并删除最终一致性却将一致性。 Amazon S3的桶,在美国标准地区提供最终一致性。

Amazon S3 buckets in [all regions except US Standard] provide read-after-write consistency for PUTS of new objects and eventual consistency for overwrite PUTS and DELETES. Amazon S3 buckets in the US Standard region provide eventual consistency.

http://aws.amazon.com/s3/faqs/

这就是为什么我以为你是使用美国标准。您所描述的行为与设计约束一致。

This is why I assumed you were using US Standard. The behavior you described is consistent with that design constraint.

您应该可以验证这不会发生在其他区域测试斗......但是,因为从EC2同一区域内的数据传输到S3是免费的,非常低的延迟,在用水桶不同区域可能不实用。

You should be able to verify that this doesn't happen with a test bucket in another region... but, because data transfer from EC2 to S3 within the same region is free and very low latency, using a bucket in a different region may not be practical.

还有另外一个选择,是值得尝试,已经做的内部运作美国标准的。

There is another option that is worth trying, has to do with the inner-workings of US-Standard.

美国标准实际上是在地理上分布的弗吉尼亚州和俄勒冈州,以及请求之间的s3.amazonaws.com有选择地通过DNS路由到一个地方或另一个。这种路由在很大程度上是一个黑盒子,但亚马逊已经露出了解决办法。

US Standard is in fact geographically-distributed between Virginia and Oregon, and requests to "s3.amazonaws.com" are selectively routed via DNS to one location or another. This routing is largely a black box, but Amazon has exposed a workaround.

您可以通过更改您的端点迫使你的请求,只排到北弗吉尼亚s3.amazonaws.com到s3-external-1.amazonaws.com......

You can force your requests to be routed only to Northern Virginia by changing your endpoint from "s3.amazonaws.com" to "s3-external-1.amazonaws.com" ...

http://docs.aws.amazon.com/一般/最新/ GR / rande.html#s3_region

<打击> ...这是炒作我的一部分,但你的问题可能是由您的要求的地理路由加剧,并迫使他们S3-外部-1(其中,要明确,仍是美国 - 标准),可能会改善或消除您的问题。

更新:的通知上面已经正式上述的猜测上升,但我会离开它作为历史参考。大约一年我写上面,亚马逊的确宣布,美国标准确实提供了读后写上新创建对象的一致性,但是的只有的时候, S3-外部1 端点使用。他们解释它,就好像它是一个新的行为,这可能是这样...但它也可能只是在平台正式支持该行为的改变。无论哪种方式:

Update: The advice above has officially risen above speculation, but I'll leave it for historical reference. About a year I wrote the above, Amazon indeed announced that US-Standard does offer read-after-write consistency on new object creation, but only when the s3-external-1 endpoint is used. They explain it as though it's a new behavior, and that may be the case... but it also may simply be a change in the behavior the platform officially supports. Either way:

开始[2015年6月19日],美国标准地区现在支持读取后写的一致性新对象添加到Amazon S3使用的弗吉尼亚州北部端点(s3-external-1.amazonaws.com)。有了这个变化,所有的Amazon S3地区现在支持读取后写的一致性。阅读后写的一致性,您可以创建在Amazon S3中后,立即检索对象。在此之前的变化,Amazon S3的桶,在美国标准区域的新创建的对象,这意味着一些小组对象可能不是用新的对象上传后立即读取提供最终一致性。这些偶尔的延迟可能复杂的数据处理工作流程,其中应用程序需要在创建对象后立即读取的对象。 请注意,在美国标准地区,这种一致性的变化适用于弗吉尼亚州北部端点(s3-external-1.amazonaws.com)。利用全球端点(s3.amazonaws.com)客户应切换到使用北弗吉尼亚端点(s3-external-1.amazonaws.com),以充分利用该读后写一致性的利益在美国标准地区的<子> [强调]

Starting [2015-06-19], the US Standard Region now supports read-after-write consistency for new objects added to Amazon S3 using the Northern Virginia endpoint (s3-external-1.amazonaws.com). With this change, all Amazon S3 Regions now support read-after-write consistency. Read-after-write consistency allows you to retrieve objects immediately after creation in Amazon S3. Prior to this change, Amazon S3 buckets in the US Standard Region provided eventual consistency for newly created objects, which meant that some small set of objects might not have been available to read immediately after new object upload. These occasional delays could complicate data processing workflows where applications need to read objects immediately after creating the objects. Please note that in US Standard Region, this consistency change applies to the Northern Virginia endpoint (s3-external-1.amazonaws.com). Customers using the global endpoint (s3.amazonaws.com) should switch to using the Northern Virginia endpoint (s3-external-1.amazonaws.com) in order to leverage the benefits of this read-after-write consistency in the US Standard Region. [emphasis added]

https://forums.aws.amazon.com/ann.jspa? annID = 3112

如果您上传大量文件(每秒钟几百次),你也可能是压倒S3的分片机制。对于非常高的数字每秒钟上传的,但重要的是你的钥匙(文件名)不是词法顺序。

If you are uploading a large number of files (hundreds per second), you might also be overwhelming S3's sharding mechanism. For very high numbers of uploads per second, it's important that your keys ("filenames") not be lexically sequential.

根据亚马逊如何处理DNS,你也可以尝试解决你的水桶,如果你的code可以处理它的另一备选的变种。

Depending on how Amazon handles DNS, you may also want to try another alternate variant of addressing your bucket if your code can handle it.

在美国标准桶可以用 http://mybucket.s3.amazonaws.com/解决关键 ...或者 http://s3.amazonaws.com/mybucket/key ...而这两个内部实现可能,至少在理论上,在一个改变的方式,将有关您的问题的行为方式不同。

Buckets in US-Standard can be addressed either with http://mybucket.s3.amazonaws.com/key ... or http://s3.amazonaws.com/mybucket/key ... and the internal implementation of these two could, at least in theory, be different in a way that changes the behavior in a way that would be relevant to your issue.

这篇关于什么是最大的Amazon S3复制时间对文件上传?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆