要检查两个图像文件是否相同. [英] To check if two image files are same..Checksum or Hash?

查看:109
本文介绍了要检查两个图像文件是否相同.的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在做一些图像处理代码,在其中我从URL下载一些图像(作为BufferedImage)并将其传递给图像处理器.

I am doing some image processing code where in I download some images(as BufferedImage) from URLs and pass it on to a image processor.

我想避免将同一图像多次传递到图像处理器(因为图像处理操作的成本很高).图像的URL端点(如果它们是相同的图像)可能会有所不同,因此我可以通过URL来防止这种情况.所以我打算做一个校验和或哈希,以识别代码是否再次遇到相同的图像.

I want to avoid passing of the same image more than once to the image processor(as the image processing operation is of high cost). The URL end points of the images(if they are same images) may vary and hence I can prevent this by the URL. So I was planning to do a checksum or hash to identify if the code is encountering the same image again.

对于md5,我尝试了快速MD5 ,它为图像生成了20K +字符长度的十六进制校验和值(一些样本).当涉及数据库存储时,显然存储20K +字符散列将是一个问题.因此,我尝试了CRC32(来自java.util.zip.CRC32).而且它生成的长度校验和确实比散列小得多.

For md5 I tried Fast MD5, and it generated a 20K+ character length hex checksum value for the image(some sample). Obviously storing this 20K+ character hash would be an issue when it comes to database storage. Hence I tried the CRC32(from java.util.zip.CRC32). And it did generate quite smaller length check sum than the hash.

我确实了解校验和和哈希分别用于不同的目的.出于上述目的,我可以只使用CRC32吗?它能解决目的,还是我必须尝试其他两个以上的尝试?

I do understand checksum and hash are for different purposes. For the purpose explained above can I just use the CRC32? Would it solve the purpose or I have to try something more than these two?

谢谢, 阿比

推荐答案

CRC与MD5之间的区别在于,篡改文件以匹配目标" MD5比篡改文件以使其匹配更加困难. 目标"校验和.由于这对于您的程序来说似乎不是问题,所以使用哪种方法都无关紧要.也许MD5可能会占用更多的CPU资源,但我不知道这种不同是否很重要.

The difference between CRC and, say, MD5, is that it is more difficult to tamper a file to match a "target" MD5 than to tamper it to match a "target" checksum. Since this does not seem a problem for your program, it should not matter which method do you use. Maybe MD5 might be a little more CPU intensive, but I do not know if that different will matter.

主要问题应该是摘要的字节数.

The main question should be the number of bytes of the digest.

如果您要对整数进行校验和,则意味着对于2K大小的文件,将2 ^ 2048个组合拟合为2 ^ 32个组合->对于每个CRC值,您将有2 ^ 64个可能匹配的文件.如果您有128位MD5,则可能有2 ^ 16个可能的冲突.

If you are doing a checksum in an integer will mean that, for a file of 2K size, you are fitting 2^2048 combinations into 2^32 combinations --> for every CRC value, you will have 2^64 possible files that match it. If you have a 128 bits MD5, then you have 2^16 possible collisions.

您计算出的代码越大,发生冲突的可能性就越小(假设计算出的代码分布均匀),所以比较安全.

The bigger the code that you compute, the less possible collisions (given that the codes computed are distributed evenly), so the safer the comparation.

无论如何,为了减少可能的错误,我认为第一个分类应该使用文件大小...首先比较文件大小(如果匹配),然后比较校验和/哈希值.

Anyway, in order to minimice possible errors, I think the first classification should be using file size... first compare file sizes, if they match then compare checksums/hash.

这篇关于要检查两个图像文件是否相同.的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆