获取文件SHA256哈希码和校验和 [英] Get a file SHA256 Hash code and Checksum

查看:137
本文介绍了获取文件SHA256哈希码和校验和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以前,我曾问过问题有关合并SHA1 + MD5的问题,但后来我明白了计算大型文件的SHA1和MD5并不比SHA256快. 以我为例,在Linux系统中,默认的实现SHA256和(C#MONO)具有4.6 GB的文件,大约需要10分钟.

Previously I asked a question about combining SHA1+MD5 but after that I understand calculating SHA1 and then MD5 of a lagrge file is not that faster than SHA256. In my case a 4.6 GB file takes about 10 mins with the default implementation SHA256 with (C# MONO) in a Linux system.

public static string GetChecksum(string file)
{
    using (FileStream stream = File.OpenRead(file))
    {
        var sha = new SHA256Managed();
        byte[] checksum = sha.ComputeHash(stream);
        return BitConverter.ToString(checksum).Replace("-", String.Empty);
    }
}

然后我阅读了

Then I read this topic and somehow change my code according what they said to :

public static string GetChecksumBuffered(Stream stream)
{
    using (var bufferedStream = new BufferedStream(stream, 1024 * 32))
    {
        var sha = new SHA256Managed();
        byte[] checksum = sha.ComputeHash(bufferedStream);
        return BitConverter.ToString(checksum).Replace("-", String.Empty);
    }
}

但是它没有这种感情,大约需要9分钟.

But It doesn't have such a affection and takes about 9 mins.

然后我尝试通过Linux中的sha256sum命令对同一文件进行测试,大约需要28秒,以上代码和Linux命令给出的结果相同!

Then I try to test my file through sha256sum command in Linux for the same file and It takes about 28 secs and both the above code and Linux command give the same result !

有人建议我阅读有关哈希码和校验和的区别,我可以访问本主题解释了差异.

Someone advised me to read about differences between Hash Code and Checksum and I reach to this topic that explains the differences.

我的问题是:

  1. 是什么原因导致上述代码与Linux sha256sum在时间上如此不同?

以上代码是做什么的? (我的意思是说它是哈希码计算还是校验和计算?因为如果您在C#中搜索有关给出文件的哈希码和文件的校验和,它们都会到达上述代码.)

What does the above code do ? (I mean is it the hash code calculation or checksum calculation? Because if you search about give a hash code of a file and checksum of a file in C#, they both reach to the above code.)

即使SHA256具有抗碰撞能力,对sha256sum是否有积极的攻击?

Is there any motivated attack against sha256sum even when SHA256 is collision resistant ?

如何在C#中使实现与sha256sum一样快?

How can I make my implementation as fast as sha256sum in C#?

推荐答案

  1. 我的最佳猜测是File.Read操作的Mono实现中还有一些额外的缓冲.最近,我们在大型规格的Windows计算机上检查了大文件的校验和,如果一切运行顺利,您应该期望每Gb大约6秒钟.

  1. My best guess is that there's some additional buffering in the Mono implementation of the File.Read operation. Having recently looked into checksums on a large file, on a decent spec Windows machine you should expect roughly 6 seconds per Gb if all is running smoothly.

奇怪的是,在多个基准测试中已经报告SHA-512明显比SHA-256快(请参阅下面的3).另一种可能性是问题不在于分配数据,而在于一旦读取就处置字节.您也许可以在单个数组上使用TransformBlock(和TransformFinalBlock),而不用大张旗鼓地读取流-我不知道这是否行得通,但值得研究.

Oddly it has been reported in more than one benchmark test that SHA-512 is noticeably quicker than SHA-256 (see 3 below). One other possibility is that the problem is not in allocating the data, but in disposing of the bytes once read. You may be able to use TransformBlock (and TransformFinalBlock) on a single array rather than reading the stream in one big gulp—I have no idea if this will work, but it bears investigating.

哈希码和校验和之间的区别是(几乎)语义.它们都计算出较短的魔术"数,该数字对于输入中的数据而言是相当独特的,尽管如果您有4.6GB的输入和64B的输出,则相当"会受到一定限制.

The difference between hashcode and checksum is (nearly) semantics. They both calculate a shorter 'magic' number that is fairly unique to the data in the input, though if you have 4.6GB of input and 64B of output, 'fairly' is somewhat limited.

  • 校验和并不安全,只需做一些工作,您就可以从足够的输出中找出输入,从输出到输入进行反向工作,并进行各种不安全的事情.
  • 加密散列的计算需要更长的时间,但是仅更改输入中的一位将彻底改变输出,对于良好的散列(例如SHA-512),尚无从输出返回输入的已知方法.

MD5易碎:您可以在PC上构造一个输入以产生任何给定的输出. SHA-256(可能)仍是安全的,但是不会在几年内出现-如果您的项目的寿命以几十年为单位,则假定您需要对其进行更改. SHA-512没有已知的攻击,并且可能不会持续很长时间,并且由于它比SHA-256快,因此无论如何我还是建议这样做.基准测试显示,SHA-512的计算时间比MD5大约长3倍,因此,如果可以解决速度问题,这就是解决之道.

MD5 is breakable: you can fabricate an input to produce any given output, if needed, on a PC. SHA-256 is (probably) still secure, but won't be in a few years time—if your project has a lifespan measured in decades, then assume you'll need to change it. SHA-512 has no known attacks and probably won't for quite a while, and since it's quicker than SHA-256 I'd recommend it anyway. Benchmarks show it takes about 3 times longer to calculate SHA-512 than MD5, so if your speed issue can be dealt with, it's the way to go.

不知道,除了上面提到的那些.您做对了.

No idea, beyond those mentioned above. You're doing it right.

要了解一点点的信息,请参阅 Crypto.SE:SHA51比SHA256快吗? /a>

For a bit of light reading, see Crypto.SE: SHA51 is faster than SHA256?

根据评论中的问题进行编辑

校验和的目的是使您可以检查文件在最初写入到使用之间的时间是否已更改.它通过产生一个较小的值(在SHA512情况下为512位)来实现此目的,原始文件的每一位至少对输出值有贡献.哈希码的目的是相同的,另外,通过对文件进行精心管理的更改,其他人很难,真的很难获得相同的输出值.

The purpose of a checksum is to allow you to check if a file has changed between the time you originally wrote it, and the time you come to use it. It does this by producing a small value (512 bits in the case of SHA512) where every bit of the original file contributes at least something to the output value. The purpose of a hashcode is the same, with the addition that it is really, really difficult for anyone else to get the same output value by making carefully managed changes to the file.

前提是,如果开始和检查时校验和相同,则文件相同,如果不同,则文件肯定已更改.上面的操作是通过滚动读取,折叠并旋转读取的位以产生较小值的算法来完整地馈送文件.

The premise is that if the checksums are the same at the start and when you check it, then the files are the same, and if they're different the file has certainly changed. What you are doing above is feeding the file, in its entirety, through an algorithm that rolls, folds and spindles the bits it reads to produce the small value.

作为一个例子:在我当前正在编写的应用程序中,我需要知道文件大小的任何部分是否已更改.我将文件拆分为16K块,对每个块进行SHA-512哈希处理,然后将其存储在另一个驱动器上的单独数据库中.当我来看文件是否已更改时,我将为每个块重新生成哈希并将其与原始块进行比较.由于我使用的是SHA-512,因此具有相同哈希值的已更改文件的可能性非常小,因此我可以自信地检测出数百GB数据的变化,同时仅在数据库中存储几MB的哈希值.我在进行散列的同时复制文件,整个过程完全是磁盘绑定的.将文件传输到USB驱动器大约需要5分钟,其中10秒可能与哈希相关.

As an example: in the application I'm currently writing, I need to know if parts of a file of any size have changed. I split the file into 16K blocks, take the SHA-512 hash of each block, and store it in a separate database on another drive. When I come to see if the file has changed, I reproduce the hash for each block and compare it to the original. Since I'm using SHA-512, the chances of a changed file having the same hash are unimaginably small, so I can be confident of detecting changes in 100s of GB of data whilst only storing a few MB of hashes in my database. I'm copying the file at the same time as taking the hash, and the process is entirely disk-bound; it takes about 5 minutes to transfer a file to a USB drive, of which 10 seconds is probably related to hashing.

缺少用于存储哈希的磁盘空间是我在邮购USB记忆棒后无法解决的问题吗?

Lack of disk space to store hashes is a problem I can't solve in a post—buy a USB stick?

这篇关于获取文件SHA256哈希码和校验和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆