使用Spark MD5在本地计算S3 ETag [英] Calculate S3 ETag locally using spark md5

查看:491
本文介绍了使用Spark MD5在本地计算S3 ETag的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已将文件14MB分别以块(5MB)的大小上传到S3,并且还使用spark-md5计算了每个块的哈希值.每个小块的单独哈希(由spark-md5生成)与上传到S3的每个大块的ETag匹配.

I have uploaded a file 14MB to S3 in chunks (5MB) each and also using spark-md5 calculated the hash of each chunk. The individual hash of each chunk (generated by spark-md5) is matching with ETag of each chunk uploaded to S3.

但是通过完全上传到S3生成的ETag哈希值与spark-md5生成的本地计算的哈希值不匹配.以下是本地哈希的步骤:

But the ETag hash generated by doing full upload to S3 is not matching with locally calculated hash generated by spark-md5. Below are the steps for local hash:

  1. 生成每个块的哈希(由spark-md5生成)
  2. 加入每个块的哈希
  3. 转换为十六进制
  4. 计算出的哈希值

下面是代码,请检查是否有任何错误. 方法1:

Below is the code, please check if there is any mistake. Approach 1:

        var mergeChunk = self.chunkArray.join('');
        console.log("mergeChunk: " + mergeChunk);

        var hexString = toHexString(mergeChunk);
        console.log("toHexString: " + hexString);

        var cspark1 = SparkMD5.hash(hexString);
        console.log("SparkMD5 final hash: " + cspark1);

方法2:

       var mergeChunk = self.chunkArray.join('');
       console.log("mergeChunk: " + mergeChunk);
       var cspark2 = SparkMD5.hash(mergeChunk);
       console.log("SparkMD5 final hash: " + cspark2);

请提供用于计算ETag的正确逻辑.

Please provide correct logic for calculating ETag.

推荐答案

etags是不透明的;对于分段上传的标签是什么,AWS不做任何保证.

etags are meant to be opaque; AWS don't make any guarantees as to what to the tag of a multipart upload is.

我认为它只是块中的猫(按照最终POST中列出的顺序),但是您不能依靠它.

I think it is just the cat of the blocks (in the order listed in the final POST), but you cannot rely on that.

这篇关于使用Spark MD5在本地计算S3 ETag的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆