如何通过版本控制获取S3存储桶中所有文件的大小? [英] How to get size of all files in an S3 bucket with versioning?

查看:66
本文介绍了如何通过版本控制获取S3存储桶中所有文件的大小?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道此命令可以提供存储桶中所有文件的大小:

I know this command can provide the size of all files in a bucket:

aws s3 ls mybucket --recursive --summarize --human-readable

但这不解释版本.

如果我运行此命令:

aws s3 ls s3://mybucket/myfile --human-readable

它将显示类似"100 MiB"的内容,但该文件可能具有10个版本,总计将更像是"1 GiB".

It will show something like "100 MiB" but it may have 10 versions of this file which will be more like "1 GiB" total.

最接近的是获取给定文件每个版本的大小:

The closest I have is getting the sizes of every version of a given file:

aws s3api list-object-versions --bucket mybucket --prefix "myfile" --query 'Versions[?StorageClass=`STANDARD`].Size' > /tmp/s3_myfile_version_sizes

然后取所有版本大小的总和.

Then take the sum of all version sizes.

但是我必须为存储桶中的每个文件重新运行此命令.

But I would have to rerun this command for every file in a bucket.

有更简单的方法吗?

推荐答案

您可以在整个存储桶上运行 list-object-versions :

aws s3api list-object-versions --bucket my-bucket --query 'Versions[*].Size'

使用 jq 对其进行总结:

aws s3api list-object-versions --bucket my-bucket --query 'Versions[*].Size' | jq add

或者,如果您需要人类可读的输出:

Or, if you need a human readable output:

aws s3api list-object-versions --bucket my-bucket --query 'Versions[*].Size' | jq add | numfmt  --to=iec-i --suffix=B

如果您想知道给定文件夹"的大小,并且还可以获取版本对象的数量,也可以添加前缀:

You can also add a prefix in case you want to know the size of a given "folder" and maybe get also the number of version objects:

aws s3api list-object-versions --bucket my-bucket --prefix my-folder --query 'Versions[*].Size' | jq 'length|add'

或者您可以使用 jq 过滤来编写更复杂的过滤器,例如,仅包括非当前对象:

Or you can use jq filtering to write more complex filters, for example, including only non-current objects:

aws s3api list-object-versions --bucket my-bucket --prefix my-folder | jq '[.Versions[]|select(.IsLatest == false)|.Size] | length,add'

如果 jq 不可用,不幸的是,使用-output text 选项会导致制表符分隔的值,因此这是一种强制将其分隔开然后再行的方法总计:

If jq is not available, using the --output text option unfortunately results in tab-separated values, so here's a hack to force it to separate lines and then add up the total:

aws s3api list-object-versions --bucket my-bucket --query 'Versions[*].[Size,Size]' --output text  | awk '{s+=$1} END {printf "%.0f", s}'

如果您有大量对象,则最好使用

If you have a large number of objects, it might be better to use data provided by the Amazon S3 Storage Inventory:

Amazon S3库存为S3存储桶或共享前缀(即,每天或每周)提供对象及其对应的元数据的逗号分隔值(CSV)平面文件输出,其名称以公共字符串开头的对象.

Amazon S3 inventory provides a comma-separated values (CSV) flat-file output of your objects and their corresponding metadata on a daily or weekly basis for an S3 bucket or a shared prefix (that is, objects that have names that begin with a common string).

这篇关于如何通过版本控制获取S3存储桶中所有文件的大小?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆