如何获取子项以进行迭代并最终在AWS S3中进行迭代 [英] How to get subkeys to iterate over and eventually the files inside them in AWS S3

查看:105
本文介绍了如何获取子项以进行迭代并最终在AWS S3中进行迭代的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将AWS S3密钥路径作为 bucket-name/fo1/fo2/fo3 ,其子路径为 bucket-name/fo1/fo2/fo3/fo_1,bucket-name/fo1/fo2/fo3/fo_2,bucket-name/fo1/fo2/fo3/fo_3 等.我要遍历路径 bucket-name/fo1/fo2/fo3 中的这些键 fo_1,fo_2,fo_3等.

我尝试了以下操作,但这不起作用.

  s3 = boto3.client('s3')s3_bucket ='存储桶名称'前缀='fo1/fo2/fo3'对于s3.list_objects_v2中的obj(Bucket = s3_bucket,Prefix = prefix,Delimiter ='/'):#在这里,当我打印obj时,它是一个值为'MaxKeys'的字符串 

任何帮助将不胜感激!

更新:

  s3://bucket-name/fo1/fo2/fo3/fo_1/文件1...fo_2/文件2...fo_3/文件1...fo_4/文件1...... 

这是我的结构,我希望在其中获取fo_1,fo_2,fo_3和文件.我想要对象 fo3 中的所有内容,而没有其他内容.

解决方案

要了解的有关Amazon S3的第一件事是文件夹不存在.而是,对象以其完整路径存储为 Key (文件名).

例如,我可以使用 AWS命令行界面(CLI):

  aws s3 cp foo.txt s3://my-bucket/fo1/fo2/fo3/foo.txt 

即使文件夹不存在,此操作也可以.

为方便人们使用,我们通过通用前缀的概念提供了一组假装"文件夹.因此,在管理控制台中,文件夹将显示出现在该文件夹中.但是,如果该对象随后被删除,则使用:

  aws s3 rm s3://my-buket/fo1/fo2/fo3/foo.txt 

结果是文件夹将立即消失,因为它们实际上从未存在!

为方便起见,一些Amazon S3命令允许您指定 Prefix Delimiter .例如,这可以用于仅列出 fo3 文件夹中的对象.它实际上只是在列出具有以 fo1/fo2/fo3/开头的 Key 的对象.返回对象的 Key 时,它将始终具有对象的完整路径,因为 Key 实际上的完整路径.(没有与完整的 Key 分开的文件名的概念.)

因此,如果要列出 fo1 fo2 fo3 中的所有文件,则可以使用>

 

,并接收以 fo1/开头的所有对象,但这将包含在子文件夹中的对象,因为它们的前缀都为<代码> fo1/.

底线:与其考虑老式目录,不如将Amazon S3视为一个平面存储结构.如有必要,您可以使用自己的代码过滤结果.

I have AWS S3 key path as bucket-name/fo1/fo2/fo3 that has subpaths as bucket-name/fo1/fo2/fo3/fo_1, bucket-name/fo1/fo2/fo3/fo_2, bucket-name/fo1/fo2/fo3/fo_3 and so on. I want to iterate over these keys fo_1, fo_2, fo_3 etc. within the path bucket-name/fo1/fo2/fo3.

I tried the following but this doesn't work.

s3 = boto3.client('s3')
s3_bucket = 'bucket-name'

prefix = 'fo1/fo2/fo3'
for obj in s3.list_objects_v2(Bucket=s3_bucket, Prefix=prefix, Delimiter='/'):
     # Here when I print obj, it's a string with value as 'MaxKeys'

Any help will be appreciated!

UPDATE:

s3://bucket-name/
        fo1/
           fo2/
              fo3/
                 fo_1/
                     file1
                     ...
                 fo_2/
                     file2
                     ...
                 fo_3/
                     file1
                     ...
                 fo_4/
                     file1
                     ...
                 ...

This is my structure and I am looking to get fo_1, fo_2, fo_3 and files inside it. I want everything inside object fo3 and nothing outside of that.

解决方案

The first thing to understand about Amazon S3 is that folders do not exist. Rather, objects are stored with their full path as their Key (filename).

For example, I could copy a file to a bucket using the AWS Command-Line Interface (CLI):

aws s3 cp foo.txt s3://my-bucket/fo1/fo2/fo3/foo.txt

This would work even though the folders do not exist.

To make things convenient for humans, there is a "pretend" set of folders that are provided via the concept of a common prefix. Thus, in the management console, the folders would appear to be there. However, if the object was then deleted with:

aws s3 rm s3://my-buket/fo1/fo2/fo3/foo.txt

The result is that the folders would immediately disappear because they never actually existed!

Also for convenience, some Amazon S3 commands allow you to specify a Prefix and Delimiter. This can be used to, for example, only list objects in the fo3 folder. What it is really doing is merely listing the objects that have a Key that starts with fo1/fo2/fo3/. When the Key for the object is returned, it will always have the full path to the object, because the Key actually is the full path. (There is no concept of a filename separate to the complete Key.)

So, if you want a listing of all files in fo1 and fo2 and fo3, you can do a listing with a Prefix of fo1 and receive back all objects that start with fo1/, but this will include objects in sub-folders since they all have a prefix of fo1/.

Bottom line: Rather than thinking of old-fashioned directories, think of Amazon S3 as a flat storage structure. If necessary, you can do filtering of results in your own code.

这篇关于如何获取子项以进行迭代并最终在AWS S3中进行迭代的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆