AWS CLI S3API在路径中找到最新文件夹 [英] AWS CLI S3API find newest folder in path

查看:88
本文介绍了AWS CLI S3API在路径中找到最新文件夹的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常大的存储桶(成千上万个对象).我有一个路径(说s3://myBucket/path1/path2)./path2获取也是文件夹的上载.因此,一个示例可能看起来像:

I've got a very large bucket (hundreds of thousands of objects). I've got a path (lets say s3://myBucket/path1/path2). /path2 gets uploads that are also folders. So a sample might look like:

s3://myBucket/path1/path2/v6.1.0
s3://myBucket/path1/path2/v6.1.1
s3://myBucket/path1/path2/v6.1.102
s3://myBucket/path1/path2/v6.1.2
s3://myBucket/path1/path2/v6.1.25
s3://myBucket/path1/path2/v6.1.99

S3不考虑版本号排序(这很有意义),但是按字母顺序排列,列表中的最后一个不是最后一个上载.在该示例中,.../v6.1.102是最新的.

S3 doesn't take into account version number sorting (which makes sense) but alphabetically the last in the list is not the last uploaded. In that example .../v6.1.102 is the newest.

这是到目前为止我得到的:

Here's what I've got so far:

aws s3api list-objects 
--bucket myBucket
--query "sort_by(Contents[?contains(Key, \`path1/path2\`)],&LastModified)"´ 
--max-items 20000

因此,这里的一个问题是max-items似乎是从存储桶中递归地按所有文件的字母顺序开始的. 20000确实进入了我的文件,但是浏览这么多文件的过程非常缓慢.

So one problem here is max-items seems to start alphabetically from the all files recursively in the bucket. 20000 does get to my files but it's a pretty slow process to go through that many files.

所以我的问题是双重的:

So my questions are twofold:

1-这仍在搜索整个存储桶,但我只想将其范围缩小到path2/.我可以这样做吗?

1 - This is still searching the whole bucket but I just want to narrow it down to path2/ . Can I do this?

2-这仅列出了对象,是否可以仅提取路径列表?

2 - This lists just objects, is it possible to pull up just a path list instead?

基本上,最终目标是我只希望命令返回上例中的最新文件夹名称,例如"v6.1.102".

Basically the end goal is I just want a command to return the newest folder name like 'v6.1.102' from the example above.

推荐答案

要回答#1,您可以添加--prefix path1/path2以限制存储桶中要查询的内容.

To answer #1, you could add the --prefix path1/path2 to limit what you're querying in the bucket.

关于最后修改的排序,我只能想到使用SDK来组合

In terms of sorting by last modified, I can only think of using an SDK to combine the list_objects_v2 and head_object (boto3) to get last modified on the objects and programmatically sort

更新

或者,您可以在 jmespath 中按LastModified反向排序,然后返回第一项给您最新的对象并从那里收集目录.

Alternatively, you could reverse sort by LastModified in jmespath and return the first item to give you the most recent object and gather the directory from there.

aws s3api list-objects-v2 \
--bucket myBucket \
--prefix path1/path2 \
--query 'reverse(sort_by(Contents,&LastModified))[0]'

这篇关于AWS CLI S3API在路径中找到最新文件夹的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆