使用纯PHP验证两个文件是否相同? [英] Verifying that two files are identical using pure PHP?

查看:165
本文介绍了使用纯PHP验证两个文件是否相同?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

TL; DR:我有一个CMS系统,它使用文件内容的SHA-1作为文件名来存储附件(不透明文件)。如果我已经知道SHA-1哈希匹配这两个文件,如何验证上传的文件是否真的与存储中的文件匹配?我想要有很高的性能。



长版

用户上传一个新的文件到系统,我计算上传的文件内容的SHA-1散列,然后检查存储后端中是否存在具有相同散列的文件。在我的代码运行之前,PHP把上传的文件放在 / tmp 中,然后对上传的文件运行 sha1sum 获得文件内容的SHA-1散列。然后,我从计算出来的SHA-1散列中计算扇出,并决定在NFS挂载的目录层次下的存储目录。 (例如,如果文件内容的SHA-1散列是 37aefc1e145992f2cc16fabadcfe23eede5fb094 ,永久文件名是 / nfs / data / files / 37 / ae )除了保存实际的文件内容外,我还为用户提交的元数据(例如:I code> INSERT Content-Type ,原始文件名,datestamp等)。

新上传的文件具有与存储后端中的现有散列匹配的SHA-1散列的情况。我知道这个偶然发生的变化是天文数字低,但我想肯定。

给定两个文件名 $ file_a $ file_b ,如何快速检查两个文件是否有相同的内容?假设文件太大,无法加载进入记忆。用Python,我会使用 filecmp.cmp(),但是PHP似乎没有任何相似之处。我知道这可以通过 fread()来完成,如果找到一个不匹配的字节,就会中止。但是我宁愿不写这些代码。 b $ b

解决方案

如果您已经有一个SHA1总和,您可以简单地执行:

  if($ known_sha1 == sha1_file($ new_file))

否则

  if(filesize($ file_a)== filesize($ file_b)
&& md5_file($ file_a) == md5_file($ file_b)

检查文件大小,哈希碰撞(这已经不太可能)。也使用MD5,因为它比SHA算法快得多(但少一点独特性)。



更新:



这是如何正确地比较两个文件之间的相互关系。

 < ($ file_a)==文件大小($ file_b))
{
$ fp_a = fopen($ file_a,'rb');
$ fp_b = fopen($ file_b,'rb'); $($ b $ fre $($ fp_a,4096))!
if($ b!== $ b_b)
{
fclose($ fp_a);
fclose($ fp_b);
返回false;
}
}

fclose($ fp_a);
fclose($ fp_b);

返回true;
}

return false;
}


TL;DR: I have an CMS system that stores attachments (opaque files) using SHA-1 of the file contents as the filename. How to verify if uploaded file really matches one in the storage, given that I already know that SHA-1 hash matches for both files? I'd like to have high performance.

Long version:

When an user uploads a new file to the system, I compute SHA-1 hash of the uploaded file contents and then check if a file with identical hash already exists in the storage backend. PHP puts the uploaded file in /tmp before my code gets to run and then I run sha1sum against the uploaded file to get SHA-1 hash of the file contents. I then compute fanout from the computed SHA-1 hash and decide storage directory under NFS mounted directory hierarchy. (For example, if the SHA-1 hash for a file contents is 37aefc1e145992f2cc16fabadcfe23eede5fb094 the permanent file name is /nfs/data/files/37/ae/fc1e145992f2cc16fabadcfe23eede5fb094.) In addition to saving the actual file contents, I INSERT a new line into a SQL database for the user submitted meta data (e.g. Content-Type, original filename, datestamp, etc).

The corner case I'm currently figuring out is the case where a new uploaded file has SHA-1 hash that matches existing hash in the storage backend. I know that the changes for this happening by accident are astronomically low, but I'd like to be sure.

Given two filenames $file_a and $file_b, how to quickly check if both files have identical contents? Assume that files are too big to be loaded into memory. With Python, I'd use filecmp.cmp() but PHP does not seem to have anything similar. I know that this can be done with fread() and aborting if a non-matching byte is found, but I'd rather not write that code.

解决方案

If you already have one SHA1 sum, you can simply do:

if ($known_sha1 == sha1_file($new_file))

otherwise

if (filesize($file_a) == filesize($file_b)
    && md5_file($file_a) == md5_file($file_b)
)

Checking file size too, to somewhat prevent a hash collision (which is already very unlikely). Also using MD5 because it's significantly faster than the SHA algorithms (but a little less unique).


Update:

This is how to exactly compare two files against each other.

function compareFiles($file_a, $file_b)
{
    if (filesize($file_a) == filesize($file_b))
    {
        $fp_a = fopen($file_a, 'rb');
        $fp_b = fopen($file_b, 'rb');

        while (($b = fread($fp_a, 4096)) !== false)
        {
            $b_b = fread($fp_b, 4096);
            if ($b !== $b_b)
            {
                fclose($fp_a);
                fclose($fp_b);
                return false;
            }
        }

        fclose($fp_a);
        fclose($fp_b);

        return true;
    }

    return false;
}

这篇关于使用纯PHP验证两个文件是否相同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆