亚马逊S3,同步,修改日期与上传日期: [英] Amazon S3, Syncing, Modified date vs. Uploaded Date

查看:1387
本文介绍了亚马逊S3,同步,修改日期与上传日期:的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在使用AWS SDK的.NET,我试图找出我们似乎有一个同步的问题,我们的消费类应用。基本上我们有一个推服务,生成上载到S3变更的文件,而我们的消费类应用都应该下载这些文件,并将它们应用,以同步到正确的状态,这是不会发生。

有什么/在哪里正确日戳是重新presented一些相互矛盾的观点。我们的消费者写来看看S3文件的上次更改时间字段来进行处理下载的文件进行排序,我不知道是什么了这一领域重新presents。起初我以为它会重新presented修改日期/我们上传,然后(所看到的文件创建的这里),它实际上重新presents当文件被上传了一个新的日期戳,同样在相同的链接似乎暗示了下载文件时,它将恢复到旧邮戳(但我无法证实这一点)。

我们正在使用这个片段的code拉文件

  //获取自上次成功完全更新了最新的变更列表。
Amazon.S3.AmazonS3Client客户= ...;

名单< Amazon.S3.Model.S3Object> listObjects = client.GetFullObjectList(
    this.Settings.GetS3ListObjectsRequest(this.Settings.S3ChangesetSubBucket)
    Amazon.S3.AmazonS3Client.DateComparisonType.GreaterThan,
    lastModifiedDate,
    Amazon.S3.AmazonS3Client.StringTokenComparisonType.MustContainAll,
    this.Settings.RequiredChangesetPathTokens);
 

然后再排序的S3Object的上次更改时间(我认为这是我们的假设是错误的)

 的foreach(Amazon.S3.Model.S3Object OBJ在listObjects)
{
    如果(DateTime.Parse(obj.LastModified)> lastModifiedDate)
    {
        //这是一个新的文件,所以我们使用插入排序把这个文件中有序列表
        //基于上次更改时间
    }
}
 

我是正确的假设,我们应该做更多的东西,以preserve我们自己的日戳,我们需要,如使用自定义页眉/元数据对象将正确的日戳上的文件,我们需要,甚至可以把它在文件名本身?

修改

也许,这个问题可以回答我的问题:如果我的服务有2个文件上传到S3,并经过这样做的过程中,我保证这些文件显示在S3 在他们上载的顺序(通过上次更改时间)还是S3做异步处理的一定量,可能会导致我的文件显示在S3对象的名单乱序?我担心的地方,例如,我的服务上传的文件A则B的情况下,B显示了先在S3中,我得到消费者+进程B,那么A显示了,然后我的消费者可能会或可能不会得到A和不正确地处理它以为它是新的时,它不是?

编辑2

这是我和下方怀疑的人,我们有一些比赛条件尝试应用的变更为了在一味依赖S3的日戳。作为附录,我们最终作出2修复,试图解决这个问题,这可能是其他人也有用:

首先,为了解决当我们上传完成并报告S3​​修改日期之间的竞争条件,我们决定把我们所有的查询,回顾过去1秒,从我们从拉文件中读取的最后修改日期S3。在审议此修复程序,我们看到在S3中的另一个问题,这不是明显之前,即 S3确实时间戳,而没有preserve毫秒为单位四舍五入它们备份到下一秒作为其所有时间戳。回首时间1秒规避这一点。

其次,因为我们回头看的时候,我们就必须下载同一个文件多次,如果没有发现任何新的变更要下载的文件的问题,所以我们增加了一个文件名,缓冲区,我们在我们的最后一个请求看到的文件,跳过我们已经看到任何文件,并刷新缓冲区,当我们看到新的文件。

希望这有助于。

解决方案

当在S3存储桶上市对象,从S3收到API响应将始终按字母顺序返回。

在S3 API不会让你过滤或排序根据上次更改时间值对象。任何此类过滤或排序在您使用连接到S3的客户机库专门做了。

http://docs.aws.amazon.com/AmazonS3/最新/ API / RESTBucketGET.html

至于上次更改时间值的准确性,它可能使用基于其上传时对象的列表进行排序,据我所知,上次更改时间值设置为时间上传的完成(当服务器返回一个200 OK响应),而不是时间上载开始。

这意味着,如果你开始上传这100MB大小,一秒钟后你开始上传B中只有1K的大小,在年底,A中的最后一次修改的时间戳将是B. <最后一次修改的时间戳后, / P>

如果您需要preserve上载开始的时间,最好使用自定义元数据报头与你原来的PUT请求。

We're using the AWS SDK for .NET and I'm trying to pinpoint where we seem to be having a sync problem with our consumer applications. Basically we have a push-service that generates changeset files that get uploaded to S3, and our consumer applications are supposed to download these files and apply them in order to sync up to the correct state, which is not happening.

There's some conflicting views on what/where the correct datestamps are represented. Our consumers were written to look at the s3 file's "LastModified" field to sort the downloaded files for processing, and I don't know anymore what this field represents. At first I thought it represented the date modified/created of the file we uploaded, then (as seen here) it actually represents a new date stamp of when the file was uploaded, and likewise in the same link it seems to imply that when a file is downloaded it reverts back to the old datestamp (but I cannot confirm this).

We're using this snippet of code to pull files

// Get a list of the latest changesets since the last successful full update.
Amazon.S3.AmazonS3Client client = ...;

List<Amazon.S3.Model.S3Object> listObjects = client.GetFullObjectList(
    this.Settings.GetS3ListObjectsRequest(this.Settings.S3ChangesetSubBucket), 
    Amazon.S3.AmazonS3Client.DateComparisonType.GreaterThan, 
    lastModifiedDate, 
    Amazon.S3.AmazonS3Client.StringTokenComparisonType.MustContainAll, 
    this.Settings.RequiredChangesetPathTokens);

And then sort by the S3Object's LastModified (which I think is where our assumption is wrong)

foreach (Amazon.S3.Model.S3Object obj in listObjects)
{
    if (DateTime.Parse(obj.LastModified) > lastModifiedDate)
    {
        //it's a new file, so we use insertion sort to put this file in an ordered list
        //based on LastModified
    }
}

Am I correct in assuming that we should be doing something more to preserve our own datestamps that we need, such as using custom header/metadata objects to put the correct datestamps on files that we need, or even putting it in the filename itself?

EDIT

Perhaps this question can answer my problem: If my service has 2 files to upload to S3 and goes through the process of doing that, am I guaranteed that these files show up in S3 in the order they were uploaded (via LastModified) or does S3 do some amount of asynchronous processing that could lead to my files showing up in a list of S3 object out of order? I'm worried about a case where, for example, my service uploaded files A then B, B shows up first in S3, my consumers get + process B, then A shows up, and then my consumers may or may not get A and incorrectly process it thinking it's newer when it's not?

EDIT 2

It was as I and the person below suspected and we had some racing conditions trying to apply changesets in order while blindly relying on S3's datestamps. As an addendum, we ended up making 2 fixes to try and address the problem, which might be useful for others as well:

Firstly, to address to the race condition between when our uploads finish and the modified dates reported by S3, we decided to make all our queries look into the past by 1 second from the last date modified we read from a pulled file in S3. In examining this fix we saw another problem in S3 that wasn't apparent before, namely that S3 does not preserve milliseconds on timestamps, but rather rounded them up to the next second for all its timestamps. Looking back in time by 1 second circumvented this.

Secondly, since we were looking back in time we would have the problem of downloading the same file multiple times if there weren't any new changeset files to download, so we added a filename buffer for files we saw in our last request, skipped any files we had already seen, and refreshed the buffer when we saw new files.

Hope this helps.

解决方案

When listing objects in an S3 bucket, the API response received from S3 will always return them in alphabetical order.

The S3 API does not allow you to filter or sort objects based on the LastModified value. Any such filtering or sorting is done exclusively in the client libraries that you use to connect to S3.

http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketGET.html

As for the accuracy of the LastModified value and it's possible use to sort the list of objects based on the time they were uploaded, to my knowledge, the LastModified value is set to the time the upload finishes (when the server returns a 200 OK response) and not the time the upload was started.

This means that if you start upload A that's 100MB in size and a second later you start upload B that's only 1K in size, in the end, the last modified timestamp for A will be after the last modified timestamp for B.

If you need to preserve the time your upload was started, it's best to use a custom metadata header with your original PUT request.

这篇关于亚马逊S3,同步,修改日期与上传日期:的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆