我怎样才能复制文件大于5 GB的Amazon S3吗? [英] How can I copy files bigger than 5 GB in Amazon S3?

查看:275
本文介绍了我怎样才能复制文件大于5 GB的Amazon S3吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Amazon S3的REST API文档说,有5GB的上传在PUT操作的大小限制。文件大于要使用多上传。精细。

不过,我需要什么本质上是重命名,可能是大于文件。据我知道有没有重命名或移动操作,因此,我必须将文件复制到新的位置并删除旧的。究竟是与文件大于5GB做了什么?我要做一个多上传从桶本身呢?在这种情况下,如何拆分零件文件的工作?

从阅读博托源它似乎并不像它这样的事会自动对文件大于5GB。是否有任何内置的支持,我错过了?

解决方案
  

据我知道有没有重命名或移动操作,因此我有   将文件复制到新的位置并删除旧的。

这是正确的,这是pretty的容易做的一个的 PUT对象 - 复制的操作,随后是 DELETE对象的操作(都其中,在课程博托的支持,请参阅<一href="http://docs.pythonboto.org/en/latest/ref/s3.html#boto.s3.bucket.Bucket.copy_key">copy_key()和<一href="http://docs.pythonboto.org/en/latest/ref/s3.html#boto.s3.bucket.Bucket.delete_key">delete_key()):

  

此实施PUT操作的创建对象的副本   已存储在Amazon S3中。认沽复制操作是一样的   为执行GET,然后付诸表决。添加请求头,   的X AMZ-复制源,使得PUT操作的源对象复制到   目的地桶。

然而,这的确是不可能的对象/文件大于5 GB:

  

注意
  [...]您在单个原子创建的对象了在大小副本5 GB   使用操作此API。 但是,对于复制对象大于   5 GB,则必须使用多部分上传API 。对于概念   信息[...],去上传对象使用多上传 [...] [重点煤矿]

由宝途的手段<一个同时支持这也href="http://docs.pythonboto.org/en/latest/ref/s3.html#boto.s3.multipart.MultiPartUpload.copy_part_from_key">copy_part_from_key()方法;不幸的是,所需的做法是不各自拉请求#425以外的记录(允许多部分拷贝命令)(我没有尝试过这个自己尚未虽然):

进口博托 S3 = boto.connect_s3('访问','秘密') B = s3.get_bucket('destination_bucket) MP = b.initiate_multipart_upload('TMP /大复制test.mp4) mp.copy_part_from_key('source_bucket','路径/要/来源/键,1,0,999999999) mp.copy_part_from_key('source_bucket','路径/要/来源/键,2,10亿,1999999999) mp.copy_part_from_key('source_bucket','路径/要/来源/键,3,20亿,2999999999) mp.copy_part_from_key('source_bucket','路径/要/来源/键,4,3000000000 3999999999) mp.copy_part_from_key('source_bucket','路径/要/来源/键,5,40亿,4999999999) mp.copy_part_from_key('source_bucket','路径/要/来源/键,6,50亿,5500345712) mp.complete_upload()

您可能需要研究如何在Java或.NET实现这个最终的各样品,这可能会提供更深入地了解的一般方法,看的Copying对象使用多部分上传API

祝你好运!


附件

请注意,一般就复制以下特点,它很容易被忽视的:

  

当复制一个对象,你可以preserve大部分的元数据   (默认)或指定新的元数据。 然而,ACL没有preserved   并设置为私有的发出请求的用户。要覆盖   默认的ACL设置,使用x-AMZ-ACL头指定一个新的ACL   生成的副本请求时。欲了解更多信息,请参见亚马逊S3   访问控制列表。 [重点煤矿]

Amazon S3 REST API documentation says there's a size limit of 5gb for upload in a PUT operation. Files bigger than that have to be uploaded using multipart. Fine.

However, what I need in essence is to rename files that might be bigger than that. As far as I know there's no rename or move operation, therefore I have to copy the file to the new location and delete the old one. How exactly that is done with files bigger than 5gb? I have to do a multipart upload from the bucket to itself? In that case, how splitting the file in parts work?

From reading boto's source it doesn't seem like it does anything like this automatically for files bigger than 5gb. Is there any built-in support that I missed?

解决方案

As far as I know there's no rename or move operation, therefore I have to copy the file to the new location and delete the old one.

That's correct, it's pretty easy to do for objects/files smaller than 5 GB by means of a PUT Object - Copy operation, followed by a DELETE Object operation (both of which are supported in boto of course, see copy_key() and delete_key()):

This implementation of the PUT operation creates a copy of an object that is already stored in Amazon S3. A PUT copy operation is the same as performing a GET and then a PUT. Adding the request header, x-amz-copy-source, makes the PUT operation copy the source object into the destination bucket.

However, that's indeed not possible for objects/files greater than 5 GB:

Note
[...] You create a copy of your object up to 5 GB in size in a single atomic operation using this API. However, for copying an object greater than 5 GB, you must use the multipart upload API. For conceptual information [...], go to Uploading Objects Using Multipart Upload [...] [emphasis mine]

Boto meanwhile supports this as well by means of the copy_part_from_key() method; unfortunately the required approach isn't documented outside of the respective pull request #425 (allow for multi-part copy commands) (I haven't tried this myself yet though):

import boto
s3 = boto.connect_s3('access', 'secret')
b = s3.get_bucket('destination_bucket')
mp = b.initiate_multipart_upload('tmp/large-copy-test.mp4')
mp.copy_part_from_key('source_bucket', 'path/to/source/key', 1, 0, 999999999)
mp.copy_part_from_key('source_bucket', 'path/to/source/key', 2, 1000000000, 1999999999)
mp.copy_part_from_key('source_bucket', 'path/to/source/key', 3, 2000000000, 2999999999)
mp.copy_part_from_key('source_bucket', 'path/to/source/key', 4, 3000000000, 3999999999)
mp.copy_part_from_key('source_bucket', 'path/to/source/key', 5, 4000000000, 4999999999)
mp.copy_part_from_key('source_bucket', 'path/to/source/key', 6, 5000000000, 5500345712)
mp.complete_upload()

You might want to study the respective samples on how to achieve this in Java or .NET eventually, which might provide more insight into the general approach, see Copying Objects Using the Multipart Upload API.

Good luck!


Appendix

Please be aware of the following peculiarity regarding copying in general, which is easily overlooked:

When copying an object, you can preserve most of the metadata (default) or specify new metadata. However, the ACL is not preserved and is set to private for the user making the request. To override the default ACL setting, use the x-amz-acl header to specify a new ACL when generating a copy request. For more information, see Amazon S3 ACLs. [emphasis mine]

这篇关于我怎样才能复制文件大于5 GB的Amazon S3吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆