可以计算的MD5(或其他)散列用缓冲的读? [英] Possible to calculate MD5 (or other) hash with buffered reads?
问题描述
我需要计算的相当大的文件(千兆字节)校验。这可以用下面的方法来完成:
I need to calculate checksums of quite large files (gigabytes). This can be accomplished using the following method:
private byte[] calcHash(string file)
{
System.Security.Cryptography.HashAlgorithm ha = System.Security.Cryptography.MD5.Create();
FileStream fs = new FileStream(file, FileMode.Open, FileAccess.Read);
byte[] hash = ha.ComputeHash(fs);
fs.Close();
return hash;
}
不过,文件通常在缓冲方式书面只是事先(比如写32MB的一次)。我很确信,我看到了一个散列函数,让我计算MD5(或其它),散列在同一时间写的覆盖,即:计算一个缓冲区哈希,然后喂养结果散列进入下一个迭代
However, the files are normally written just beforehand in a buffered manner (say writing 32mb's at a time). I am so convinced that I saw an override of a hash function that allowed me to calculate a MD5 (or other) hash at the same time as writing, ie: calculating the hash of one buffer, then feeding that resulting hash into the next iteration.
事情是这样的:(伪code-ISH)
Something like this: (pseudocode-ish)
byte [] hash = new byte [] { 0,0,0,0,0,0,0,0 };
while(!eof)
{
buffer = readFromSourceFile();
writefile(buffer);
hash = calchash(buffer, hash);
}
哈希现在sililar什么将由整个文件上运行的calcHash功能来实现。
hash is now sililar to what would be accomplished by running the calcHash function on the entire file.
现在,我找不到像任何覆盖the.Net 3.5框架,我在做梦?它是否从来没有存在过,还是我只是糟糕的搜索?这样做的原因写入和校验和计算一次是因为它是有意义由于大的文件。
Now, I can't find any overrides like that in the.Net 3.5 Framework, am I dreaming ? Has it never existed, or am I just lousy at searching ? The reason for doing both writing and checksum calculation at once is because it makes sense due to the large files.
推荐答案
您使用 TransformBlock
和 TransformFinalBlock
方法来处理数据块中的数据。
You use the TransformBlock
and TransformFinalBlock
methods to process the data in chunks.
// Init
MD5 md5 = MD5.Create();
int offset = 0;
// For each block:
offset += md5.TransformBlock(block, 0, block.Length, block, 0);
// For last block:
md5.TransformFinalBlock(block, 0, block.Length);
// Get the has code
byte[] hash = md5.Hash;
请注意:它的工作原理(至少与MD5供应商)所有块发送到 TransformBlock
,然后发送一个空块 TransformFinalBlock
来完成这个过程。
Note: It works (at least with the MD5 provider) to send all blocks to TransformBlock
and then send an empty block to TransformFinalBlock
to finalise the process.
这篇关于可以计算的MD5(或其他)散列用缓冲的读?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!