抓取S3存储桶中所有对象的元数据 [英] Grab Metadata for All objects in S3 bucket

查看:520
本文介绍了抓取S3存储桶中所有对象的元数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在通过附加包含md5哈希值的元数据标签来标记s3中的对象.当我上传对象时,在将对象推入s3之前,我先检查md5哈希是否不同.

I am currently tagging objects in s3 by attaching a metadata tag containing a md5 hash. When I am uploading the objects I check that the md5 hash is different before I push the object into s3.

我想知道是否有一种方法可以捕获s3存储桶中所有对象的元数据,因为似乎需要花费一些时间才能分别捕获每个项目的元数据.

I was wondering if there is a way to grab the metadata for all the objects in an s3 bucket, as it seems to take sometime to grab them individually for each item.

我正在使用以下内容从s3中获取哈希值

I am using the following to grab the hash from s3

$hash = Get-S3ObjectMetadata -Credential $AwsCredentials -BucketName $Bucketname -Key $key

当我删除-Key值时,出现以下错误

When I remove the -Key value I get the following Error

Get-S3ObjectMetadata : Key is a required property and must be set before making this call.

我还尝试了get-help -full并在-key上获得了以下信息

I also tried get-help -full and got the following information on the -key

    -Key <System.String>
    The key of the object.

    Required?                    false
    Position?                    2
    Default value                None
    Accept pipeline input?       True (ByPropertyName)
    Accept wildcard characters?  false

哪个似乎与我得到的错误矛盾

Which seems to contradict the error I get

推荐答案

获取对象元数据没有批处理模式.您必须分别获取每个对象的元数据.通常的方法是使用多个进程或线程来发送并行请求.存储桶应该能够以几百个请求/秒的速度处理此类请求,而不会出现问题.

Fetching object metadata has no batch mode. You have to fetch each object's metadata individually. The usual approach is to work with multiple processes or threads to send parallel requests. A bucket should be able to handle such requests at a rate of several hundred reqs/sec with no problems.

您的解决方案存在缺陷,因为它无法扩展.我见过的一种解决方案是将对象密钥实际设置为对象有效负载的哈希值(sha256是更好的选择,因为已知md5和sha1都有冲突),该方法实质上提供了自动重复数据删除功能.

Your solution is flawed, since it will not scale. One solution I have seen is to actually set the object key to the hash of the object payload (sha256 is a better choice, since there are known collisions for both md5 and sha1) which provides essentially automatic deduplication.

除非使用的是SSE-CSSE-KMS,否则etag是对象主体的md5sum或各个部分的二进制(不是十六进制)的md5sum的md5sum,后跟-以及部件数量(使用分段上传API上传对象时).

Unless you are using SSE-C or SSE-KMS, the etag is the md5sum of the object body or the md5sum of the binary (not hex) md5sums of the individual parts, followed by - and the number of parts, when the object is uploaded using the multipart upload API.

这篇关于抓取S3存储桶中所有对象的元数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆