从 boto3 检索 S3 存储桶中的子文件夹名称 [英] Retrieving subfolders names in S3 bucket from boto3

查看:34
本文介绍了从 boto3 检索 S3 存储桶中的子文件夹名称的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用 boto3,我可以访问我的 AWS S3 存储桶:

Using boto3, I can access my AWS S3 bucket:

s3 = boto3.resource('s3')
bucket = s3.Bucket('my-bucket-name')

现在,存储桶包含文件夹first-level,该文件夹本身包含多个以时间戳命名的子文件夹,例如1456753904534.我需要知道我正在做的另一项工作的这些子文件夹的名称,我想知道是否可以让 boto3 为我检索这些.

Now, the bucket contains folder first-level, which itself contains several sub-folders named with a timestamp, for instance 1456753904534. I need to know the name of these sub-folders for another job I'm doing and I wonder whether I could have boto3 retrieve those for me.

所以我尝试了:

objs = bucket.meta.client.list_objects(Bucket='my-bucket-name')

它给出了一个字典,它的键Contents"给了我所有的三级文件而不是二级时间戳目录,实际上我得到了一个包含内容的列表

which gives a dictionary, whose key 'Contents' gives me all the third-level files instead of the second-level timestamp directories, in fact I get a list containing things as

{u'ETag': '"etag"', u'Key': first-level/1456753904534/part-00014', u'LastModified':datetime.datetime(2016, 2, 29, 13, 52, 24, tzinfo=tzutc()),
u'Owner': {u'DisplayName': 'owner', u'ID':'id'},
u'Size':大小,u'StorageClass':'storageclass'}

{u'ETag': '"etag"', u'Key': first-level/1456753904534/part-00014', u'LastModified': datetime.datetime(2016, 2, 29, 13, 52, 24, tzinfo=tzutc()),
u'Owner': {u'DisplayName': 'owner', u'ID': 'id'},
u'Size': size, u'StorageClass': 'storageclass'}

你可以看到特定的文件,在这种情况下 part-00014 被检索,而我想单独获取目录的名称.原则上我可以从所有路径中去除目录名称,但是在第三级检索所有内容以获得第二级既丑陋又昂贵!

you can see that the specific files, in this case part-00014 are retrieved, while I'd like to get the name of the directory alone. In principle I could strip out the directory name from all the paths but it's ugly and expensive to retrieve everything at third level to get the second level!

我还尝试了此处的报告:

for o in bucket.objects.filter(Delimiter='/'):
    print(o.key)

但我没有获得所需级别的文件夹.

but I do not get the folders at the desired level.

有没有办法解决这个问题?

Is there a way to solve this?

推荐答案

S3 是一个对象存储,它没有真正的目录结构./"比较美观.人们想要一个目录结构的一个原因,因为他们可以维护/修剪/向应用程序添加一棵树.对于 S3,您将此类结构视为一种索引或搜索标签.

S3 is an object storage, it doesn't have real directory structure. The "/" is rather cosmetic. One reason that people want to have a directory structure, because they can maintain/prune/add a tree to the application. For S3, you treat such structure as sort of index or search tag.

要在 S3 中操作对象,您需要 boto3.client 或 boto3.resource,例如列出所有对象

To manipulate object in S3, you need boto3.client or boto3.resource, e.g. To list all object

import boto3 
s3 = boto3.client("s3")
all_objects = s3.list_objects(Bucket = 'bucket-name') 

http://boto3.readthedocs.org/en/latest/reference/services/s3.html#S3.Client.list_objects

事实上,如果s3对象名是使用'/'分隔符存储的.最新版本的 list_objects (list_objects_v2) 允许您将响应限制为以指定前缀开头的键.

In fact, if the s3 object name is stored using '/' separator. The more recent version of list_objects (list_objects_v2) allows you to limit the response to keys that begin with the specified prefix.

要将项目限制为某些子文件夹下的项目:

To limit the items to items under certain sub-folders:

    import boto3 
    s3 = boto3.client("s3")
    response = s3.list_objects_v2(
            Bucket=BUCKET,
            Prefix ='DIR1/DIR2',
            MaxKeys=100 )

文档

另一种选择是使用 python os.path 函数来提取文件夹前缀.问题是这需要从不需要的目录中列出对象.

Another option is using python os.path function to extract the folder prefix. Problem is that this will require listing objects from undesired directories.

import os
s3_key = 'first-level/1456753904534/part-00014'
filename = os.path.basename(s3_key) 
foldername = os.path.dirname(s3_key)

# if you are not using conventional delimiter like '#' 
s3_key = 'first-level#1456753904534#part-00014'
filename = s3_key.split("#")[-1]

关于 boto3 的提醒:boto3.resource 是一个很好的高级 API.使用 boto3.client 和 boto3.resource 各有利弊.如果你开发内部共享库,使用 boto3.resource 会给你一个黑盒层,覆盖所使用的资源.

A reminder about boto3 : boto3.resource is a nice high level API. There are pros and cons using boto3.client vs boto3.resource. If you develop internal shared library, using boto3.resource will give you a blackbox layer over the resources used.

这篇关于从 boto3 检索 S3 存储桶中的子文件夹名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆