下载S3存储桶中子文件夹的全部内容 [英] Download Entire Content of a subfolder in a S3 bucket
问题描述
我在s3中有一个名为"sample-data"的存储桶.在存储桶中,我有一个标有"A"的文件夹,到"Z".
I have a bucket in s3 called "sample-data". Inside the Bucket I have folders labelled "A" to "Z".
在每个字母文件夹内,都有更多的文件和文件夹.下载字母文件夹及其所有内容的最快方法是什么?
Inside each alphabetical folder there are more files and folders. What is the fastest way to download the alphabetical folder and all it's content?
例如-> sample-data/a/foo.txt,more_files/foo1.txt
在上面的示例中,存储桶 sample-data
包含一个名为 a
的文件夹,其中包含 foo.txt
和一个名为的文件夹more_files
,其中包含 foo1.txt
In the above example the bucket sample-data
contains an folder called a
which contains foo.txt
and a folder called more_files
which contains foo1.txt
我知道如何下载单个文件.例如,如果我想要 foo.txt
,我将执行以下操作.
I know how to download a single file. For instance if i wanted foo.txt
I would do the following.
s3 = boto3.client('s3')
s3.download_file("sample-data", "a/foo.txt", "foo.txt")
但是我想知道是否可以下载名为 a
的文件夹及其全部内容吗?任何帮助将不胜感激.
However i am wondering if i can download the folder called a
and all it's contents entirely? Any help would be appreciated.
推荐答案
我认为您最好的选择是 awscli
I think your best bet would be the awscli
aws s3 cp --recurisve s3://mybucket/your_folder_named_a path/to/your/destination
从文档中
-递归(布尔)命令在指定目录或前缀下的所有文件或对象上执行.
--recursive (boolean) Command is performed on all files or objects under the specified directory or prefix.
要使用boto3做到这一点,请尝试以下操作:
To do this with boto3 try this:
import os
import errno
import boto3
client = boto3.client('s3')
def assert_dir_exists(path):
try:
os.makedirs(path)
except OSError as e:
if e.errno != errno.EEXIST:
raise
def download_dir(bucket, path, target):
# Handle missing / at end of prefix
if not path.endswith('/'):
path += '/'
paginator = client.get_paginator('list_objects_v2')
for result in paginator.paginate(Bucket=bucket, Prefix=path):
# Download each file individually
for key in result['Contents']:
# Calculate relative path
rel_path = key['Key'][len(path):]
# Skip paths ending in /
if not key['Key'].endswith('/'):
local_file_path = os.path.join(target, rel_path)
# Make sure directories exist
local_file_dir = os.path.dirname(local_file_path)
assert_dir_exists(local_file_dir)
client.download_file(bucket, key['Key'], local_file_path)
download_dir('your_bucket', 'your_folder', 'destination')
这篇关于下载S3存储桶中子文件夹的全部内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!