可这校验算法加以改进? [英] Can this checksum algorithm be improved?

查看:114
本文介绍了可这校验算法加以改进?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个很老的,不支持的程序,将文件复制整个SMB共享。它有一个校验和算法,以确定该文件的内容在复制之前已经改变。该算法似乎很容易上当 - 我们刚刚发现了一个例子,其中两个文件,​​除了一个1改为2相同,返回相同的校验和。这里的算法:

 无符号长GetFileCheckSum(CString的PathFilename)
{
        FILE *文件;
        无符号长校验= 0;
        无符号长数据= 0;
        无符号长计数= 0;

        如果((文件= FOPEN(PathFilename,RB))!= NULL)
        {
                而(FREAD(安培;数据,1,sizeof的(无符号长),文件)= FALSE!)
                {
                        校验^ =数据+ + +计数;
                        数据= 0;
                }
                fclose函数(文件);
        }
        返回校验;
}
 

我没有太大的程序员(我是一个系统管理员),但我知道一个XOR为基础的检验将是pretty的原油。哪边该算法返回相同的校验和相同大小的不同含量的两个文件的机会? (我不期待一个确切的答案,远程或很可能是好的。)

怎么可能没有一个巨大的性能损失得到改善呢?

最后,这是怎么回事与 FREAD()?我的文档的快速扫描,但我不能弄明白。为数据被设置为依次文件的每个字节? 修改的:好了,所以它把文件读入无符号长(我们假设一个32位操作系统在这里)的块。什么是每个块包含哪些内容?如果该文件的内容是 ABCD ,什么数据的在第一轮的价值?它是(在Perl):

 (条例('A')<< 24)及(ORD('B')LT;< 16)及(ORD('C')<< 8)及ORD(D)
 

解决方案

MD5 常用验证传输文件的完整性。来源$ C ​​$ c是在C一应俱全++。它被广泛认为是一种快速,准确的算法

另请参见<一个href="http://stackoverflow.com/questions/122982/robust-and-fast-checksum-algorithm">http://stackoverflow.com/questions/122982/robust-and-fast-checksum-algorithm

We have a very old, unsupported program which copies files across SMB shares. It has a checksum algorithm to determine if the file contents have changed before copying. The algorithm seems easily fooled -- we've just found an example where two files, identical except a single '1' changing to a '2', return the same checksum. Here's the algorithm:

unsigned long GetFileCheckSum(CString PathFilename)
{
        FILE* File;
        unsigned long CheckSum = 0;
        unsigned long Data = 0;
        unsigned long Count = 0;

        if ((File = fopen(PathFilename, "rb")) != NULL)
        {
                while (fread(&Data, 1, sizeof(unsigned long), File) != FALSE)
                {
                        CheckSum ^= Data + ++Count;
                        Data = 0;
                }
                fclose(File);
        }
        return CheckSum;
}

I'm not much of a programmer (I am a sysadmin) but I know an XOR-based checksum is going to be pretty crude. What're the chances of this algorithm returning the same checksum for two files of the same size with different contents? (I'm not expecting an exact answer, "remote" or "quite likely" is fine.)

How could it be improved without a huge performance hit?

Lastly, what's going on with the fread()? I had a quick scan of the documentation but I couldn't figure it out. Is Data being set to each byte of the file in turn? Edit: okay, so it's reading the file into unsigned long (let's assume a 32-bit OS here) chunks. What does each chunk contain? If the contents of the file are abcd, what is the value of Data on the first pass? Is it (in Perl):

(ord('a') << 24) & (ord('b') << 16) & (ord('c') << 8) & ord('d')

解决方案

MD5 is commonly used to verify the integrity of transfer files. Source code is readily available in c++. It is widely considered to be a fast and accurate algorithm.

See also http://stackoverflow.com/questions/122982/robust-and-fast-checksum-algorithm

这篇关于可这校验算法加以改进?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆