下载S3存储桶中子文件夹的全部内容 [英] Download Entire Content of a subfolder in a S3 bucket

查看:137
本文介绍了下载S3存储桶中子文件夹的全部内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在s3中有一个名为"sample-data"的存储桶.在存储桶中,我有一个标有"A"的文件夹,到"Z".

I have a bucket in s3 called "sample-data". Inside the Bucket I have folders labelled "A" to "Z".

在每个字母文件夹内,都有更多的文件和文件夹.下载字母文件夹及其所有内容的最快方法是什么?

Inside each alphabetical folder there are more files and folders. What is the fastest way to download the alphabetical folder and all it's content?

例如-> sample-data/a/foo.txt,more_files/foo1.txt

在上面的示例中,存储桶 sample-data 包含一个名为 a 的文件夹,其中包含 foo.txt 和一个名为的文件夹more_files ,其中包含 foo1.txt

In the above example the bucket sample-data contains an folder called a which contains foo.txt and a folder called more_files which contains foo1.txt

我知道如何下载单个文件.例如,如果我想要 foo.txt ,我将执行以下操作.

I know how to download a single file. For instance if i wanted foo.txt I would do the following.

    s3 = boto3.client('s3')
    s3.download_file("sample-data", "a/foo.txt", "foo.txt")

但是我想知道是否可以下载名为 a 的文件夹及其全部内容吗?任何帮助将不胜感激.

However i am wondering if i can download the folder called a and all it's contents entirely? Any help would be appreciated.

推荐答案

我认为您最好的选择是 awscli

I think your best bet would be the awscli

aws s3 cp --recurisve s3://mybucket/your_folder_named_a path/to/your/destination

从文档中

-递归(布尔)命令在指定目录或前缀下的所有文件或对象上执行.

--recursive (boolean) Command is performed on all files or objects under the specified directory or prefix.

要使用boto3做到这一点,请尝试以下操作:

To do this with boto3 try this:

import os
import errno
import boto3

client = boto3.client('s3')


def assert_dir_exists(path):
    try:
        os.makedirs(path)
    except OSError as e:
        if e.errno != errno.EEXIST:
            raise


def download_dir(bucket, path, target):
    # Handle missing / at end of prefix
    if not path.endswith('/'):
        path += '/'

    paginator = client.get_paginator('list_objects_v2')
    for result in paginator.paginate(Bucket=bucket, Prefix=path):
        # Download each file individually
        for key in result['Contents']:
            # Calculate relative path
            rel_path = key['Key'][len(path):]
            # Skip paths ending in /
            if not key['Key'].endswith('/'):
                local_file_path = os.path.join(target, rel_path)
                # Make sure directories exist
                local_file_dir = os.path.dirname(local_file_path)
                assert_dir_exists(local_file_dir)
                client.download_file(bucket, key['Key'], local_file_path)


download_dir('your_bucket', 'your_folder', 'destination')

这篇关于下载S3存储桶中子文件夹的全部内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆