从s3存储桶获取一天的文件路径 [英] Get day old filepaths from s3 bucket

查看:56
本文介绍了从s3存储桶获取一天的文件路径的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在s3存储桶中有一堆文件,它们的前缀类似于下面的示例.我想与boto3连接,并在存储桶中创建一个日期部分早于一天的所有前缀的列表.例如,如果当前日期是

I have a bunch of files in an s3 bucket with prefixes like the example below. I would like to connect with boto3 and create a list of all prefixes in the bucket that have a date part older than a day. So for example if the current date was

'20191226_1213'

然后,我想创建一个列表,如下所示.谁能建议如何使用boto3做到这一点?

then I would like to create a list like the desired output below. Can anyone suggest how to do this with boto3?

示例:

's3://basepath/20191225_1217/'
's3://basepath/20191224_1012/'
's3://basepath/20191222_1114/'

所需的输出:

['s3://basepath/20191224_1012/','s3://basepath/20191222_1114/']

更新:

很抱歉,我之前没有提供更好的示例,但是我的真实文件夹路径实际上是这样的:

I'm sorry I didn't provide a better example before but my real folder path actually looks like:

's3://basepath/folder1/20191225_1217/'

推荐答案

下面是一些代码,该代码可在给定存储桶的根目录中提取通用前缀,并针对一天前"检查其名称:

Here's some code that extracts the Common Prefix in the root of the given bucket and checks their names against "one day ago":

import boto3
import datetime

s3_client = boto3.client('s3')

now = datetime.datetime.now()
comparison_time = now - datetime.timedelta(days=1)
comparison_time_string = comparison_time.strftime("%Y%m%d_%H%M") # eg 20191225_0623

response = s3_client.list_objects_v2(Bucket='my-bucket', Delimiter='/')

for prefix_dict in response['CommonPrefixes']:
    prefix = prefix_dict['Prefix']
    if prefix < comparison_time_string}:
        print(prefix) 

但是,请注意时间定义.根据运行代码的位置,时区可能(或可能不是)设置为UTC.这可能匹配或可能不匹配在文件夹名称上生成那些日期和时间的任何内容.

However, be careful about the time definitions. Depending on where you run the code, the timezone might (or might not) be set to UTC. This might, or might not, match whatever is generating those dates and times on the folder names.

更新:这是另一个版本,可在密钥的任何部分中查找日期字符串,然后将密钥输出到文件夹名称.

Update: Here's another version that looks for the date string in any part of the Key, then outputs the Key up to the folder name.

import boto3
import datetime
import re

s3_client = boto3.client('s3')

now = datetime.datetime.now()
comparison_time = now - datetime.timedelta(days=1)
comparison_time_string = comparison_time.strftime("%Y%m%d_%H%M") # eg 20191225_0623

response = s3_client.list_objects_v2(Bucket='my-bucket')

pattern = re.compile('/([\d]{8}_[\d]{4})/') # eg /20191225_0623/

old_objects = []

for object in response['Contents']:
    key = object['Key']
    result = pattern.search(key)
    if result and result.group(1) < comparison_time_string:
        old_objects.append(key[:result.end()])

print(old_objects)

这篇关于从s3存储桶获取一天的文件路径的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆