结合多个文件的MD5散列 [英] Combine MD5 hashes of multiple files

查看:94
本文介绍了结合多个文件的MD5散列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有7个文件可以生成MD5哈希值。哈希用于确保数据存储的远程副本与本地副本相同。不幸的是,这两个数据副本之间的联系令人头脑迟钝。对数据的更改非常少见,但我有一个要求,即数据必须始终同步(或尽快)。我并不是在我的(非常慢的)通信链路上传递7个不同的MD5散列,而是为每个文件生成散列,然后将这些散列组合成一个散列,然后我可以传输然后重新计算/用于比较在远程端。如果组合散列不同,那么我会开始发送7个独立散列来确定哪些文件已被更改。例如,这里是截至上周7个文件的MD5哈希值:

  0709d609d69385255c496436eb50402c 
709465a74411bd596595c7b9b158ae6a
4ab657320ef33e3d5eb498e4c13d41b7
3b49c6ab199994fd776bb63761414e72
0fc28c5a010fc3c06c0c930c88e31a15
c4ecd214662cac5aae0e53f6f252bf0e
8b086431e43148a2c2d943ba30d31cc6

我想将这些散列组合在一起,这样我就可以获得一个唯一的值(可能是另一个MD5散列值),然后我可以将它发送到远程系统。在远程系统上,我会执行相同的计算来确定整个数据是否已被更改。如果有,那么我会开始发送各个哈希等。最重要的因素是我的组合哈希应该足够短,以便它比使用首先发送全部7个哈希的带宽更少。我想写7个MD5哈希到一个文件,然后哈希该文件,但有没有更好的方法?

解决方案

'你:


  • 生成7个MD5散列(这就是你现在正在做的),然后是
  • 将这7个散列输出组合成更大的字节数组和MD5散列,以产生整体散列。 (每个MD5哈希值是16个字节,所以您最终将得到一个112字节的数组,您将散列以获得整体哈希值)。


    <如果你的整体哈希与另一端匹配,那么就不需要做任何事情。如果没有,那么你开始发送你的中间7次散列来计算出哪些文件已经改变。


    I have 7 files that I'm generating MD5 hashes for. The hashes are used to ensure that a remote copy of the data store is identical to the local copy. Unfortunately, the link between these two copies of the data is mind numbingly slow. Changes to the data are very rare but I have a requirement that the data be synchronized at all times (or as soon as possible). Rather than passing 7 different MD5 hashes across my (extremely slow) communications link, I'd like to generate the hash for each file and then combine these hashes into a single hash which I can then transfer and then re-calculate/use for comparison on the remote side. If the "combined hash" differs, then I'd start sending the 7 individual hashes to determine exactly which file(s) have been changed. For example, here are the MD5 hashes for the 7 files as of last week:

    0709d609d69385255c496436eb50402c
    709465a74411bd596595c7b9b158ae6a
    4ab657320ef33e3d5eb498e4c13d41b7
    3b49c6ab199994fd776bb63761414e72
    0fc28c5a010fc3c06c0c930c88e31a15
    c4ecd214662cac5aae0e53f6f252bf0e
    8b086431e43148a2c2d943ba30d31cc6
    

    I'd like to combine these hashes together such that I get a single unique value (perhaps another MD5 hash?) that I can then send to the remote system. On the remote system, I'd then perform the same calculation to determine if the data as a whole has been changed. If it has, then I'd start sending the individual hashes, etc. The most important factor is that my "combined hash" be short enough so that it uses less bandwidth than just sending all 7 hashes in the first place. I thought of writing the 7 MD5 hashes to a file and then hashing that file but is there a better way?

    解决方案

    Why don't you:

    • Generate the 7 MD5 hashes (which is what you are doing now), and then
    • Combine these 7 hash outputs into a larger byte array and MD5 hash that to produce an overall hash. (Each MD5 hash is 16 bytes, so you will end up with a 112 byte array which you will hash to get the overall hash).

    If your overall hash matches with the other end, then nothing needs to be done. If not, then you start to send over your intermediate 7 hashes to work out which file(s) have changed.

    这篇关于结合多个文件的MD5散列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆