Boto3 S3:获取文件而不获取文件夹 [英] Boto3 S3: Get files without getting folders

查看:640
本文介绍了Boto3 S3:获取文件而不获取文件夹的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用boto3,如何在不检索文件夹的情况下检索S3存储桶中的所有文件?

Using boto3, how can I retrieve all files in my S3 bucket without retrieving the folders?

请考虑以下文件结构:

file_1.txt
folder_1/
    file_2.txt
    file_3.txt
    folder_2/
        folder_3/
            file_4.txt

在此示例中,我仅对4个文件感兴趣.

In this example Im only interested in the 4 files.

手动解决方案是:

def count_files_in_folder(prefix):
    total = 0
    keys = s3_client.list_objects(Bucket=bucket_name, Prefix=prefix)
    for key in keys['Contents']:
        if key['Key'][-1:] != '/':
            total += 1
    return total

在这种情况下,总数为4.

In this case total would be 4.

如果我刚刚做过

count = len(s3_client.list_objects(Bucket=bucket_name, Prefix=prefix))

结果将是7个对象(4个文件和3个文件夹):

the result would be 7 objects (4 files and 3 folders):

file.txt
folder_1/
folder_1/file_2.txt
folder_1/file_3.txt
folder_1/folder_2/
folder_1/folder_2/folder_3/
folder_1/folder_2/folder_3/file_4.txt

我只想要:

file.txt
folder_1/file_2.txt
folder_1/file_3.txt  
folder_1/folder_2/folder_3/file_4.txt

推荐答案

S3是对象商店.它不将文件/对象存储在目录树下. 新来者总是混淆他们给定的文件夹"选项,实际上这是对象的任意前缀.

S3 is an OBJECT STORE. It DOES NOT store file/object under directories tree. New comer always confuse the "folder" option given by them, which in fact an arbitrary prefix for the object.

对象PREFIX是一种检索由预定义的修复文件名(关键字)前缀结构组织的对象的方法,例如.

object PREFIX is a way to retrieve your object organised by predefined fix file name(key) prefix structure, e.g. .

您可以想象使用一个不允许创建目录的文件系统,但是允许您使用斜杠"/"或反斜杠"\"作为分隔符来创建文件名,并且可以将".level"表示为通过通用前缀的文件.

You can imagine using a file system that don't allow you to create a directory, but allow you to create file name with a slash "/" or backslash "\" as delimiter, and you can denote "level" of the file by a common prefix.

因此,在S3中,可以使用以下命令来模拟目录",而不是目录.

Thus in S3, you can use following to "simulate directory" that is not a directory.

folder1-folder2-folder3-myobject
folder1/folder2/folder3/myobject
folder1\folder2\folder3\myobject

如您所见,无论使用哪种任意文件夹分隔符(定界符),对象名称都可以存储在S3中.

As you can see, object name can store inside S3 regardless what kind of arbitrary folder separator(delimiter) you use.

但是,为了帮助用户将批量文件传输到S3,诸如aws cli,s3_transfer api之类的工具会尝试简化步骤并按照输入的本地文件夹结构创建对象名称.

However, to help user to make bulks file transfer to S3, tools such as aws cli, s3_transfer api attempt to simplify the step and create object name follow your input local folder structure.

因此,如果您确定所有S3对象都使用/\作为分隔符,则可以使用S3transfer或AWSCcli之类的工具通过键名进行简单下载.

So if you are sure that all the S3 object is using / or \ as separator , you can use tools like S3transfer or AWSCcli to make a simple download by using the key name.

这是使用资源迭代器的快速而肮脏的代码.使用s3.resource.object.filter将返回没有与list_objects()/list_objects_v2()相同的1000个键限制的迭代器.

Here is the quick and dirty code using the resource iterator. Using s3.resource.object.filter will return iterator that doesn't have same 1000 keys limit as list_objects()/list_objects_v2().

import os 
import boto3
s3 = boto3.resource('s3')
mybucket = s3.Bucket("mybucket")
# if blank prefix is given, return everything)
bucket_prefix="/some/prefix/here"
objs = mybucket.objects.filter(
    Prefix = bucket_prefix)

for obj in objs:
    path, filename = os.path.split(obj.key)
    # boto3 s3 download_file will throw exception if folder not exists
    try:
        os.makedirs(path) 
    except FileExistsError:
        pass
    mybucket.download_file(obj.key, obj.key)

这篇关于Boto3 S3:获取文件而不获取文件夹的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆