这个校验和算法可以改进吗? [英] Can this checksum algorithm be improved?

查看:210
本文介绍了这个校验和算法可以改进吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个非常老的,不受支持的程序,通过SMB共享复制文件。它具有校验和算法以确定文件内容在复制之前是否改变。该算法似乎很容易被愚弄 - 我们刚刚发现一个例子,其中两个文件相同,除了单个'1'更改为'2',返回相同的校验和。这是算法:

 无符号长GetFileCheckSum(CString PathFilename)
{
FILE * File;
unsigned long CheckSum = 0;
unsigned long Data = 0;
unsigned long Count = 0;

if((File = fopen(PathFilename,rb))!= NULL)
{
while(fread(& Data,1,sizeof ,File)!= FALSE)
{
CheckSum ^ = Data + ++ Count;
Data = 0;
}
fclose(File);
}
return CheckSum;
}



我不是一个程序员(我是一个系统管理员)知道基于XOR的校验和将是相当粗糙的。这个算法对于具有不同内容的相同大小的两个文件返回相同校验和的机会是多少? (我不希望有一个确切的答案,远程或很可能是好的。)



如果没有巨大的性能影响, p>

最后, fread()是怎么回事?我有一个快速扫描的文档,但我不能弄清楚。是否将 Data 依次设置为文件的每个字节? ::好吧,所以它把文件读入 unsigned long (假设这里有一个32位操作系统)每个块包含什么?如果文件的内容是 abcd ,在第一次通过时 Data 的值是多少?是(在Perl中):

 (ord('a')<< 24) (ord('b')<< 16)& (ord('c')<< 8)& ord('d')


解决方案

MD5 通常用于验证传输文件的完整性。源代码很容易在c ++中获得。它被广泛认为是一种快速和准确的算法。



另请参见 http://stackoverflow.com/questions/122982/robust-and-fast-checksum-algorithm


We have a very old, unsupported program which copies files across SMB shares. It has a checksum algorithm to determine if the file contents have changed before copying. The algorithm seems easily fooled -- we've just found an example where two files, identical except a single '1' changing to a '2', return the same checksum. Here's the algorithm:

unsigned long GetFileCheckSum(CString PathFilename)
{
        FILE* File;
        unsigned long CheckSum = 0;
        unsigned long Data = 0;
        unsigned long Count = 0;

        if ((File = fopen(PathFilename, "rb")) != NULL)
        {
                while (fread(&Data, 1, sizeof(unsigned long), File) != FALSE)
                {
                        CheckSum ^= Data + ++Count;
                        Data = 0;
                }
                fclose(File);
        }
        return CheckSum;
}

I'm not much of a programmer (I am a sysadmin) but I know an XOR-based checksum is going to be pretty crude. What're the chances of this algorithm returning the same checksum for two files of the same size with different contents? (I'm not expecting an exact answer, "remote" or "quite likely" is fine.)

How could it be improved without a huge performance hit?

Lastly, what's going on with the fread()? I had a quick scan of the documentation but I couldn't figure it out. Is Data being set to each byte of the file in turn? Edit: okay, so it's reading the file into unsigned long (let's assume a 32-bit OS here) chunks. What does each chunk contain? If the contents of the file are abcd, what is the value of Data on the first pass? Is it (in Perl):

(ord('a') << 24) & (ord('b') << 16) & (ord('c') << 8) & ord('d')

解决方案

MD5 is commonly used to verify the integrity of transfer files. Source code is readily available in c++. It is widely considered to be a fast and accurate algorithm.

See also http://stackoverflow.com/questions/122982/robust-and-fast-checksum-algorithm

这篇关于这个校验和算法可以改进吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆