如何通过版本控制获取S3存储桶中所有文件的大小? [英] How to get size of all files in an S3 bucket with versioning?
问题描述
我知道此命令可以提供存储桶中所有文件的大小:
I know this command can provide the size of all files in a bucket:
aws s3 ls mybucket --recursive --summarize --human-readable
但这不解释版本.
如果我运行此命令:
aws s3 ls s3://mybucket/myfile --human-readable
它将显示类似"100 MiB"的内容,但该文件可能具有10个版本,总计将更像是"1 GiB".
It will show something like "100 MiB" but it may have 10 versions of this file which will be more like "1 GiB" total.
最接近的是获取给定文件每个版本的大小:
The closest I have is getting the sizes of every version of a given file:
aws s3api list-object-versions --bucket mybucket --prefix "myfile" --query 'Versions[?StorageClass=`STANDARD`].Size' > /tmp/s3_myfile_version_sizes
然后取所有版本大小的总和.
Then take the sum of all version sizes.
但是我必须为存储桶中的每个文件重新运行此命令.
But I would have to rerun this command for every file in a bucket.
有更简单的方法吗?
推荐答案
您可以在整个存储桶上运行 list-object-versions
:
aws s3api list-object-versions --bucket my-bucket --query 'Versions[*].Size'
使用 jq
对其进行总结:
aws s3api list-object-versions --bucket my-bucket --query 'Versions[*].Size' | jq add
或者,如果您需要人类可读的输出:
Or, if you need a human readable output:
aws s3api list-object-versions --bucket my-bucket --query 'Versions[*].Size' | jq add | numfmt --to=iec-i --suffix=B
如果您想知道给定文件夹"的大小,并且还可以获取版本对象的数量,也可以添加前缀:
You can also add a prefix in case you want to know the size of a given "folder" and maybe get also the number of version objects:
aws s3api list-object-versions --bucket my-bucket --prefix my-folder --query 'Versions[*].Size' | jq 'length|add'
或者您可以使用 jq
过滤来编写更复杂的过滤器,例如,仅包括非当前对象:
Or you can use jq
filtering to write more complex filters, for example, including only non-current objects:
aws s3api list-object-versions --bucket my-bucket --prefix my-folder | jq '[.Versions[]|select(.IsLatest == false)|.Size] | length,add'
如果 jq
不可用,不幸的是,使用-output text
选项会导致制表符分隔的值,因此这是一种强制将其分隔开然后再行的方法总计:
If jq
is not available, using the --output text
option unfortunately results in tab-separated values, so here's a hack to force it to separate lines and then add up the total:
aws s3api list-object-versions --bucket my-bucket --query 'Versions[*].[Size,Size]' --output text | awk '{s+=$1} END {printf "%.0f", s}'
If you have a large number of objects, it might be better to use data provided by the Amazon S3 Storage Inventory:
Amazon S3库存为S3存储桶或共享前缀(即,每天或每周)提供对象及其对应的元数据的逗号分隔值(CSV)平面文件输出,其名称以公共字符串开头的对象.
Amazon S3 inventory provides a comma-separated values (CSV) flat-file output of your objects and their corresponding metadata on a daily or weekly basis for an S3 bucket or a shared prefix (that is, objects that have names that begin with a common string).
这篇关于如何通过版本控制获取S3存储桶中所有文件的大小?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!