如何检测两个文件在Python中是否相同 [英] How to detect whether two files are identical in Python

查看:283
本文介绍了如何检测两个文件在Python中是否相同的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在这种情况下,系统调用md5sum file1和md5sum file2并比较两个返回值是否足够?

Is making system call to "md5sum file1" and "md5sum file2" and compare two return values enough in this case?

推荐答案

p>那么,这将告诉你他们是否是绝对不同或可能相同。这是可能两个文件有相同的哈希,但实际上没有相同的数据...只是不太可能。

Well, that will tell you whether they're definitely different or probably the same. It's possible for two files to have the same hash but not actually have the same data... just very unlikely.

在你的情况,如果你得到一个假阳性(即如果你认为他们是相同的,但他们不是)的影响是什么? MD5可能是足够好,不担心冲突,如果他们只会偶尔发生 ...但如果你有安全(或钱)在危机,有人可以种植一个坏文件与相同的哈希作为一个好文件,你不应该依赖它。

In your situation, what is the impact if you get a false positive (i.e. if you think they're the same, but they're not)? MD5 is probably good enough not to worry about collisions if they would only occur accidentally... but if you've got security (or money) at stake and someone could plant a "bad" file with the same hash as a "good" file, you shouldn't rely on it.

个人,我可能只是读两个文件,比较每个字节 - 关闭比较,散列和这种方法将需要读取整个文件,当它们相等;正如Daniel在评论中指出的那样,进行逐字节比较可以让您在看到差异时尽早退出。首先比较文件大小是另一个快速优化:)

Personally, I'd probably just read both files, comparing each byte - for a one off comparison, both the hashing and this approach will require reading the whole file when they're equal; as Daniel points out in the comments, doing a byte-by-byte comparison lets you exit early as soon as you see a difference. Comparing the file sizes first is another quick optimization :)

散列的一般优点发生在将现有文件的哈希存储在某处时,以便下次可以刚刚读取新文件。

The general advantage of hashing occurs when you store the hash of the existing file somewhere, so that next time you can just read the new file.

这篇关于如何检测两个文件在Python中是否相同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆