如何获取子项以进行迭代并最终在AWS S3中进行迭代 [英] How to get subkeys to iterate over and eventually the files inside them in AWS S3
问题描述
我将AWS S3密钥路径作为 bucket-name/fo1/fo2/fo3
,其子路径为 bucket-name/fo1/fo2/fo3/fo_1,bucket-name/fo1/fo2/fo3/fo_2,bucket-name/fo1/fo2/fo3/fo_3
等.我要遍历路径 bucket-name/fo1/fo2/fo3
中的这些键 fo_1,fo_2,fo_3等
.
我尝试了以下操作,但这不起作用.
s3 = boto3.client('s3')s3_bucket ='存储桶名称'前缀='fo1/fo2/fo3'对于s3.list_objects_v2中的obj(Bucket = s3_bucket,Prefix = prefix,Delimiter ='/'):#在这里,当我打印obj时,它是一个值为'MaxKeys'的字符串
任何帮助将不胜感激!
更新:
s3://bucket-name/fo1/fo2/fo3/fo_1/文件1...fo_2/文件2...fo_3/文件1...fo_4/文件1......
这是我的结构,我希望在其中获取fo_1,fo_2,fo_3和文件.我想要对象 fo3
中的所有内容,而没有其他内容.
要了解的有关Amazon S3的第一件事是文件夹不存在
.而是,对象以其完整路径存储为 Key
(文件名).
例如,我可以使用 AWS命令行界面(CLI):
aws s3 cp foo.txt s3://my-bucket/fo1/fo2/fo3/foo.txt
即使文件夹不存在,此操作也可以.
为方便人们使用,我们通过通用前缀的概念提供了一组假装"文件夹.因此,在管理控制台中,文件夹将显示出现在该文件夹中.但是,如果该对象随后被删除,则使用:
aws s3 rm s3://my-buket/fo1/fo2/fo3/foo.txt
结果是文件夹将立即消失,因为它们实际上从未存在!
为方便起见,一些Amazon S3命令允许您指定 Prefix
和 Delimiter
.例如,这可以用于仅列出 fo3
文件夹中的对象.它实际上只是在列出具有以 fo1/fo2/fo3/
开头的 Key
的对象.返回对象的 Key
时,它将始终具有对象的完整路径,因为 Key
实际上是的完整路径.(没有与完整的 Key
分开的文件名的概念.)
因此,如果要列出 fo1
和 fo2
和 fo3
中的所有文件,则可以使用>
,并接收以 fo1/
开头的所有对象,但这将包含在子文件夹中的对象,因为它们的前缀都为<代码> fo1/.
底线:与其考虑老式目录,不如将Amazon S3视为一个平面存储结构.如有必要,您可以使用自己的代码过滤结果.
I have AWS S3 key path as bucket-name/fo1/fo2/fo3
that has subpaths as
bucket-name/fo1/fo2/fo3/fo_1, bucket-name/fo1/fo2/fo3/fo_2, bucket-name/fo1/fo2/fo3/fo_3
and so on. I want to iterate over these keys fo_1, fo_2, fo_3 etc.
within the path bucket-name/fo1/fo2/fo3
.
I tried the following but this doesn't work.
s3 = boto3.client('s3')
s3_bucket = 'bucket-name'
prefix = 'fo1/fo2/fo3'
for obj in s3.list_objects_v2(Bucket=s3_bucket, Prefix=prefix, Delimiter='/'):
# Here when I print obj, it's a string with value as 'MaxKeys'
Any help will be appreciated!
UPDATE:
s3://bucket-name/
fo1/
fo2/
fo3/
fo_1/
file1
...
fo_2/
file2
...
fo_3/
file1
...
fo_4/
file1
...
...
This is my structure and I am looking to get fo_1, fo_2, fo_3 and files inside it. I want everything inside object fo3
and nothing outside of that.
The first thing to understand about Amazon S3 is that folders do not exist
. Rather, objects are stored with their full path as their Key
(filename).
For example, I could copy a file to a bucket using the AWS Command-Line Interface (CLI):
aws s3 cp foo.txt s3://my-bucket/fo1/fo2/fo3/foo.txt
This would work even though the folders do not exist.
To make things convenient for humans, there is a "pretend" set of folders that are provided via the concept of a common prefix. Thus, in the management console, the folders would appear to be there. However, if the object was then deleted with:
aws s3 rm s3://my-buket/fo1/fo2/fo3/foo.txt
The result is that the folders would immediately disappear because they never actually existed!
Also for convenience, some Amazon S3 commands allow you to specify a Prefix
and Delimiter
. This can be used to, for example, only list objects in the fo3
folder. What it is really doing is merely listing the objects that have a Key
that starts with fo1/fo2/fo3/
. When the Key
for the object is returned, it will always have the full path to the object, because the Key
actually is the full path. (There is no concept of a filename separate to the complete Key
.)
So, if you want a listing of all files in fo1
and fo2
and fo3
, you can do a listing with a Prefix
of fo1
and receive back all objects that start with fo1/
, but this will include objects in sub-folders since they all have a prefix of fo1/
.
Bottom line: Rather than thinking of old-fashioned directories, think of Amazon S3 as a flat storage structure. If necessary, you can do filtering of results in your own code.
这篇关于如何获取子项以进行迭代并最终在AWS S3中进行迭代的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!