可以使用缓冲读取计算 MD5(或其他)哈希值吗? [英] Possible to calculate MD5 (or other) hash with buffered reads?
问题描述
我需要计算相当大的文件(千兆字节)的校验和.这可以使用以下方法完成:
I need to calculate checksums of quite large files (gigabytes). This can be accomplished using the following method:
private byte[] calcHash(string file)
{
System.Security.Cryptography.HashAlgorithm ha = System.Security.Cryptography.MD5.Create();
FileStream fs = new FileStream(file, FileMode.Open, FileAccess.Read);
byte[] hash = ha.ComputeHash(fs);
fs.Close();
return hash;
}
然而,文件通常是预先以缓冲方式写入的(比如一次写入 32mb).我非常确信我看到了一个散列函数的覆盖,它允许我在写入的同时计算 MD5(或其他)散列,即:计算一个缓冲区的散列,然后将得到的散列输入下一次迭代.
However, the files are normally written just beforehand in a buffered manner (say writing 32mb's at a time). I am so convinced that I saw an override of a hash function that allowed me to calculate a MD5 (or other) hash at the same time as writing, ie: calculating the hash of one buffer, then feeding that resulting hash into the next iteration.
像这样:(伪代码)
byte [] hash = new byte [] { 0,0,0,0,0,0,0,0 };
while(!eof)
{
buffer = readFromSourceFile();
writefile(buffer);
hash = calchash(buffer, hash);
}
hash 现在与通过在整个文件上运行 calcHash 函数所实现的相似.
hash is now sililar to what would be accomplished by running the calcHash function on the entire file.
现在,我在 .Net 3.5 框架中找不到任何类似的覆盖,我是在做梦吗?它从未存在过,还是我只是不擅长搜索?一次进行写入和校验和计算的原因是因为文件很大,这很有意义.
Now, I can't find any overrides like that in the.Net 3.5 Framework, am I dreaming ? Has it never existed, or am I just lousy at searching ? The reason for doing both writing and checksum calculation at once is because it makes sense due to the large files.
推荐答案
您使用 TransformBlock
和 TransformFinalBlock
方法来处理数据块.
You use the TransformBlock
and TransformFinalBlock
methods to process the data in chunks.
// Init
MD5 md5 = MD5.Create();
int offset = 0;
// For each block:
offset += md5.TransformBlock(block, 0, block.Length, block, 0);
// For last block:
md5.TransformFinalBlock(block, 0, block.Length);
// Get the has code
byte[] hash = md5.Hash;
注意:将所有块发送到 TransformBlock
,然后将空块发送到 TransformFinalBlock
以完成该过程.
Note: It works (at least with the MD5 provider) to send all blocks to TransformBlock
and then send an empty block to TransformFinalBlock
to finalise the process.
这篇关于可以使用缓冲读取计算 MD5(或其他)哈希值吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!