删除重复图片 [英] Removing Duplicate Images

查看:140
本文介绍了删除重复图片的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有照片图像大小几百演出的集合。大量的照片在视觉上重复,但具有不同的filesizes,分辨率,COM pression等。

We have a collection of photo images sizing a few hundred gigs. A large number of the photos are visually duplicates, but with differing filesizes, resolution, compression etc.

是否有可能使用任何特定的图像处理方法,搜索出并删除这些重复的图片?

Is it possible to use any specific image processing methods to search out and remove these duplicate images?

推荐答案

我最近想完成这个任务的一个PHP的图片库。我希望能够以产生模糊指纹用于上载的图像,并检查其具有相同的指纹的任何图像的数据库,表明它们是相似的,然后更紧密地比较它们来确定如何相似。

I recently wanted to accomplish this task for a PHP image gallery. I wanted to be able to generate a "fuzzy" fingerprint for an uploaded image, and check a database for any images that had the same fingerprint, indicating they were similar, and then compare them more closely to determine how similar.

予完成它通过调整上传图像150像素宽,减少它灰度图像,四舍五入每种颜色的值关16(给出的灰度0和255之间17种可能的色调)的最接近的倍数,正常化它们并它们存储在一个阵列,从而创造一个模糊的颜色直方图,然后创建我可以再寻找我的数据库直方图的的md5sum。这是非常有效的缩小了一些十分直观地相似,上传的文件照片。

I accomplished it by resizing the uploaded image to 150 pixels wide, reducing it to greyscale, rounding the value of each colour off to the nearest multiple of 16 (giving 17 possible shades of grey between 0 and 255), normalise them and store them in an array, thereby creating a "fuzzy" colour histogram, then creating an md5sum of the histogram which I could then search for in my database. This was extremely effective in narrowing down images which were very visually similar to the uploaded file.

然后于上载的文件对数据库中的每个类似的图像进行比较,我把两个图像,将它们调整为16×16,和由像素分析他们的像素并把每个像素的RGB值远离对应的值像素中的其他图像,加入所有值在一起并通过像素给我的平均颜色偏差的数除。任何低于特定值被确定为重复的

Then to compare the uploaded file against each "similar" image in the database, I took both images, resized them to 16x16, and analysed them pixel by pixel and took the RGB value of each pixel away from the value of the corresponding pixel in the other image, adding all the values together and dividing by the number of pixels giving me an average colour deviation. Anything less than specific value was determined to be a duplicate.

整个事情是写在PHP使用GD模块,以及对数以千计的图片的对比每个上传的文件只需要几百毫秒。

The whole thing is written in PHP using the GD module, and a comparison against thousands of images takes only a few hundred milliseconds per uploaded file.

我的code和方法是在这里:<一href="http://www.catpa.ws/php-duplicate-image-finder/">http://www.catpa.ws/php-duplicate-image-finder/

My code, and methodology is here: http://www.catpa.ws/php-duplicate-image-finder/

这篇关于删除重复图片的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆