是什么算法计算亚马逊-S3 Etag的一个文件比5GB大? [英] What is the algorithm to compute the Amazon-S3 Etag for a file larger than 5GB?

查看:225
本文介绍了是什么算法计算亚马逊-S3 Etag的一个文件比5GB大?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

文件比5GB小有一个ETag这仅仅是该文件的MD5哈希值,这可以很容易地检查你的本地文件是相同的,你放什么在S3上。

Files uploaded to Amazon S3 that are smaller than 5GB have an ETag that is simply the MD5 hash of the file, which makes it easy to check if your local files are the same as what you put on S3.

但如果你的文件大小超过5GB大,那么亚马逊的计算ETag的不同。

But if your file is larger than 5GB, then Amazon computes the ETag differently.

例如,我做了一个多载的380份5970150664字节的文件。现在S3表明它有6bcf86bed8807b8e78f0fc6e0a53079d-380一个ETag。我的本地文件具有702242d3703818ddefe6bf7da2bed757 MD5哈希。我认为破折号后面的数字是零件的多部分上传的数量。

For example, I did a multipart upload of a 5970150664 byte file in 380 parts. Now S3 shows it to have an ETag of 6bcf86bed8807b8e78f0fc6e0a53079d-380. My local file has an md5 hash of 702242d3703818ddefe6bf7da2bed757. I think the number after the dash is the number of parts in the multipart upload.

我也怀疑,新的ETag(破折号前)仍然是一个MD5哈希,但也有一些元数据包括沿多部分上传莫名其妙的方式。

I also suspect that the new ETag (before the dash) is still an MD5 hash, but with some meta data included along the way from the multipart upload somehow.

有谁知道如何使用相同的算法亚马逊S3来计算E​​tag的?

Does anyone know how to compute the Etag using the same algorithm as Amazon S3?

推荐答案

只是验证之一。关闭亚马逊帽子使得它很简单,可以猜测。

Just verified one. Hats off to Amazon for making it simple enough to be guessable.

假设你上传了一个14MB的文件,你的一部分大小为5MB。计算对应于每个部分3 MD5校验,即在第一5MB,第二5MB,上次4MB的校验和。然后把他们串联的校验和。由于MD5校验是十六进制再$ P $二进制数据psentations,只要不是ASCII或UTF-8 EN codeD串联确保你去codeD二进制级联的MD5。当这样做,添加连字符和零件的数目,以获得的ETag

Say you uploaded a 14MB file and your part size is 5MB. Calculate 3 MD5 checksums corresponding to each part, i.e. the checksum of the first 5MB, the second 5MB, and the last 4MB. Then take the checksum of their concatenation. Since MD5 checksums are hex representations of binary data, just make sure you take the MD5 of the decoded binary concatenation, not of the ASCII or UTF-8 encoded concatenation. When that's done, add a hyphen and the number of parts to get the ETag.

下面是命令,从控制台做它在Mac OS X上:

Here are the commands to do it on Mac OS X from the console:

$ dd bs=1m count=5 skip=0 if=someFile | md5 >>checksums.txt
5+0 records in
5+0 records out
5242880 bytes transferred in 0.019611 secs (267345449 bytes/sec)
$ dd bs=1m count=5 skip=5 if=someFile | md5 >>checksums.txt
5+0 records in
5+0 records out
5242880 bytes transferred in 0.019182 secs (273323380 bytes/sec)
$ dd bs=1m count=5 skip=10 if=someFile | md5 >>checksums.txt
2+1 records in
2+1 records out
2599812 bytes transferred in 0.011112 secs (233964895 bytes/sec)

在这一点上所有的校验和的 checksums.txt 。将它们串联和去code中的十六进制,并得到了很多的MD5校验,只要使用

At this point all the checksums are in checksums.txt. To concatenate them and decode the hex and get the MD5 checksum of the lot, just use

$ xxd -r -p checksums.txt | md5

而现在追加-3,以获得ETag的,因为有3个部分。

And now append "-3" to get the ETag, since there were 3 parts.

值得一提的是, MD5 在Mac OS X刚写出来的校验和,但的md5sum 在Linux上还输出文件名。你需要剥离,但我敢肯定有一些选项只能输出校验。你不必担心空白的原因 XXD 将忽略它。

It's worth noting that md5 on Mac OS X just writes out the checksum, but md5sum on Linux also outputs the filename. You'll need to strip that, but I'm sure there's some option to only output the checksums. You don't need to worry about whitespace cause xxd will ignore it.

更新:有人告诉我这样的一个实现在 https://github.com/Teachnova/s3md5 ,这并不在OS X上工作,这里有一个要点,我写了工作脚本OS X

Update: I was told about an implementation of this at https://github.com/Teachnova/s3md5, which doesn't work on OS X. Here's a Gist I wrote with a working script for OS X.

这篇关于是什么算法计算亚马逊-S3 Etag的一个文件比5GB大?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆