为大于 5GB 的文件计算 Amazon-S3 Etag 的算法是什么? [英] What is the algorithm to compute the Amazon-S3 Etag for a file larger than 5GB?

查看:50
本文介绍了为大于 5GB 的文件计算 Amazon-S3 Etag 的算法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

上传到 Amazon S3 的小于 5GB 的文件有一个 ETag,它只是文件的 MD5 哈希值,这样可以轻松检查您的本地文件是否与您放在 S3 上的相同.

Files uploaded to Amazon S3 that are smaller than 5GB have an ETag that is simply the MD5 hash of the file, which makes it easy to check if your local files are the same as what you put on S3.

但是如果您的文件大于 5GB,那么亚马逊会以不同的方式计算 ETag.

But if your file is larger than 5GB, then Amazon computes the ETag differently.

例如,我将一个 5,970,150,664 字节的文件分成 380 个部分进行了分段上传.现在 S3 显示它具有 6bcf86bed8807b8e78f0fc6e0a53079d-380 的 ETag.我的本地文件的 md5 哈希值为 702242d3703818ddefe6bf7da2bed757.我认为破折号后面的数字是分段上传的分段数.

For example, I did a multipart upload of a 5,970,150,664 byte file in 380 parts. Now S3 shows it to have an ETag of 6bcf86bed8807b8e78f0fc6e0a53079d-380. My local file has an md5 hash of 702242d3703818ddefe6bf7da2bed757. I think the number after the dash is the number of parts in the multipart upload.

我还怀疑新的 ETag(在破折号之前)仍然是 MD5 哈希,但在某种程度上包含了从分段上传过程中包含的一些元数据.

I also suspect that the new ETag (before the dash) is still an MD5 hash, but with some meta data included along the way from the multipart upload somehow.

有谁知道如何使用与 Amazon S3 相同的算法计算 ETag?

Does anyone know how to compute the ETag using the same algorithm as Amazon S3?

推荐答案

假设你上传了一个 14MB 的文件到一个没有服务器端加密的存储桶,你的部分大小是 5MB.计算每个部分对应的3个MD5校验和,即前5MB、后5MB、后4MB的校验和.然后取它们连接的校验和.MD5 校验和通常以二进制数据的十六进制表示形式打印,因此请确保采用解码二进制连接的 MD5,而不是 ASCII 或 UTF-8 编码连接的 MD5.完成后,添加连字符和零件数以获得 ETag.

Say you uploaded a 14MB file to a bucket without server-side encryption, and your part size is 5MB. Calculate 3 MD5 checksums corresponding to each part, i.e. the checksum of the first 5MB, the second 5MB, and the last 4MB. Then take the checksum of their concatenation. MD5 checksums are often printed as hex representations of binary data, so make sure you take the MD5 of the decoded binary concatenation, not of the ASCII or UTF-8 encoded concatenation. When that's done, add a hyphen and the number of parts to get the ETag.

以下是从控制台在 Mac OS X 上执行此操作的命令:

Here are the commands to do it on Mac OS X from the console:

$ dd bs=1m count=5 skip=0 if=someFile | md5 >>checksums.txt
5+0 records in
5+0 records out
5242880 bytes transferred in 0.019611 secs (267345449 bytes/sec)
$ dd bs=1m count=5 skip=5 if=someFile | md5 >>checksums.txt
5+0 records in
5+0 records out
5242880 bytes transferred in 0.019182 secs (273323380 bytes/sec)
$ dd bs=1m count=5 skip=10 if=someFile | md5 >>checksums.txt
2+1 records in
2+1 records out
2599812 bytes transferred in 0.011112 secs (233964895 bytes/sec)

此时所有的校验和都在checksums.txt中.要连接它们并解码十六进制并获取该批次的 MD5 校验和,只需使用

At this point all the checksums are in checksums.txt. To concatenate them and decode the hex and get the MD5 checksum of the lot, just use

$ xxd -r -p checksums.txt | md5

现在附加-3"获得 ETag,因为有 3 个部分.

And now append "-3" to get the ETag, since there were 3 parts.

注意事项

  • 如果您通过 aws s3 cp 使用 aws-cli 上传那么你很可能有一个 8MB 的块大小.根据 docs,即默认值.
  • 如果存储桶打开了服务器端加密 (SSE),则 ETag 将不是 MD5 校验和(请参阅 API 文档).但如果您只是想验证上传的部分是否与您发送的内容匹配,您可以使用 Content-MD5 标头和 S3 会为您比较.
  • md5 在 macOS 上只写出校验和,但 md5sum 在 Linux/brew 上也会输出文件名.你需要去掉它,但我确信有一些选项可以只输出校验和.您无需担心空格,因为 xxd 会忽略它.
  • If you uploaded with aws-cli via aws s3 cp then you most likely have a 8MB chunksize. According to the docs, that is the default.
  • If the bucket has server-side encryption (SSE) turned on, the ETag won't be the MD5 checksum (see the API documentation). But if you're just trying to verify that an uploaded part matches what you sent, you can use the Content-MD5 header and S3 will compare it for you.
  • md5 on macOS just writes out the checksum, but md5sum on Linux/brew also outputs the filename. You'll need to strip that, but I'm sure there's some option to only output the checksums. You don't need to worry about whitespace cause xxd will ignore it.

代码链接

  • A Gist I wrote with a working script for macOS.
  • The project at s3md5.

这篇关于为大于 5GB 的文件计算 Amazon-S3 Etag 的算法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆