Boto3 从 S3 Bucket 下载所有文件 [英] Boto3 to download all files from a S3 Bucket
问题描述
我正在使用 boto3 从 s3 存储桶中获取文件.我需要一个类似的功能,比如 aws s3 sync
I'm using boto3 to get files from s3 bucket. I need a similar functionality like aws s3 sync
我当前的代码是
#!/usr/bin/python
import boto3
s3=boto3.client('s3')
list=s3.list_objects(Bucket='my_bucket_name')['Contents']
for key in list:
s3.download_file('my_bucket_name', key['Key'], key['Key'])
这工作正常,只要存储桶只有文件.如果存储桶中存在文件夹,则会引发错误
This is working fine, as long as the bucket has only files. If a folder is present inside the bucket, its throwing an error
Traceback (most recent call last):
File "./test", line 6, in <module>
s3.download_file('my_bucket_name', key['Key'], key['Key'])
File "/usr/local/lib/python2.7/dist-packages/boto3/s3/inject.py", line 58, in download_file
extra_args=ExtraArgs, callback=Callback)
File "/usr/local/lib/python2.7/dist-packages/boto3/s3/transfer.py", line 651, in download_file
extra_args, callback)
File "/usr/local/lib/python2.7/dist-packages/boto3/s3/transfer.py", line 666, in _download_file
self._get_object(bucket, key, filename, extra_args, callback)
File "/usr/local/lib/python2.7/dist-packages/boto3/s3/transfer.py", line 690, in _get_object
extra_args, callback)
File "/usr/local/lib/python2.7/dist-packages/boto3/s3/transfer.py", line 707, in _do_get_object
with self._osutil.open(filename, 'wb') as f:
File "/usr/local/lib/python2.7/dist-packages/boto3/s3/transfer.py", line 323, in open
return open(filename, mode)
IOError: [Errno 2] No such file or directory: 'my_folder/.8Df54234'
这是使用 boto3 下载完整 s3 存储桶的正确方法吗?如何下载文件夹.
Is this a proper way to download a complete s3 bucket using boto3. How to download folders.
推荐答案
当处理具有 1000 多个对象的存储桶时,有必要实现一个解决方案,该解决方案在最多连续的集合上使用 NextContinuationToken
, 1000 个键.此解决方案首先编译对象列表,然后迭代创建指定目录并下载现有对象.
When working with buckets that have 1000+ objects its necessary to implement a solution that uses the NextContinuationToken
on sequential sets of, at most, 1000 keys. This solution first compiles a list of objects then iteratively creates the specified directories and downloads the existing objects.
import boto3
import os
s3_client = boto3.client('s3')
def download_dir(prefix, local, bucket, client=s3_client):
"""
params:
- prefix: pattern to match in s3
- local: local path to folder in which to place files
- bucket: s3 bucket with target contents
- client: initialized s3 client object
"""
keys = []
dirs = []
next_token = ''
base_kwargs = {
'Bucket':bucket,
'Prefix':prefix,
}
while next_token is not None:
kwargs = base_kwargs.copy()
if next_token != '':
kwargs.update({'ContinuationToken': next_token})
results = client.list_objects_v2(**kwargs)
contents = results.get('Contents')
for i in contents:
k = i.get('Key')
if k[-1] != '/':
keys.append(k)
else:
dirs.append(k)
next_token = results.get('NextContinuationToken')
for d in dirs:
dest_pathname = os.path.join(local, d)
if not os.path.exists(os.path.dirname(dest_pathname)):
os.makedirs(os.path.dirname(dest_pathname))
for k in keys:
dest_pathname = os.path.join(local, k)
if not os.path.exists(os.path.dirname(dest_pathname)):
os.makedirs(os.path.dirname(dest_pathname))
client.download_file(bucket, k, dest_pathname)
这篇关于Boto3 从 S3 Bucket 下载所有文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!