使用仅比较纯图像数据和图像相似性进行重复照片搜索? [英] Duplicate photo searching with compare only pure imagedata and image similarity?

查看：315 发布时间：2020/6/12 19:39:47 image perl bash image-processing duplicate-removal

本文介绍了使用仅比较纯图像数据和图像相似性进行重复照片搜索?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在13年中收集了大约600GB的照片-现在存储在freebsd zfs/server中.

Having approximately 600GB of photos collected over 13 years - now stored on freebsd zfs/server.

照片来自家庭计算机，从多个部分备份到不同的外部USB HDD，从磁盘灾难中重建的图像，来自不同的照片处理软件(iPhoto，Picassa，HP和许多其他:()在几个深层子目录中-不久= 可怕的错误，其中有很多重复项.

Photos comes from family computers, from several partial backups to different external USB HDDs, reconstructed images from disk disasters, from different photo manipulation softwares (iPhoto, Picassa, HP and many others :( ) in several deep subdirectories - shortly = TERRIBLE MESS with many duplicates.

所以我首先要做的是:

在树中搜索相同大小的文件(快速)，并对其进行md5校验和.
收集的重复图像(相同大小+相同的md5 =重复)

这很有帮助，但是这里仍然有很多重复项:

This helped a lot, but here are still MANY MANY duplicates:

照片仅与某些照片管理软件添加的exif/iptc数据有所不同，但是图像是相同的(或至少看起来一样"并且具有相同的尺寸)
或者它们只是原始图像的调整大小版本
或者它们是原稿的增强版"，等等.

现在是问题:

如何在不具有exif/IPTC和类似元信息的情况下，仅对JPG中的纯图像字节"进行校验和，如何查找重复项?因此，要过滤出重复的照片，只有exif标签有什么不同，但是图像是相同的. (因此文件校验和不起作用，但是图像校验和可以...). (我希望)这不是很复杂-但需要一些指导.
哪个perl模块可以从JPG文件中提取纯"图像数据，什么可用于比较/校验和?

how to find duplicates withg checksuming only the "pure image bytes" in a JPG without exif/IPTC and like meta informations? So, want filter out the photo-duplicates, what are different only with exif tags, but the image is the same. (therefore file checksuming doesn't works, but image checksuming could...). This is (i hope) not very complicated - but need some direction.
What perl module can extract the "pure" image data from an JPG file what is usable for comparison/checksuming?

更复杂

如何查找相似"图像，只有什么
- 调整大小的原件
- 引人入胜"的原件版本(来自某些照片处理程序)
- how to find "similar" images, what are only the
  - resized versions of the originals
  - "enchanced" versions of the originals (from some photo manipulation programs)
  我能够使复杂的脚本是 BASH 和"+-" :)知道 perl..可以直接在服务器上使用FreeBSD/Linux实用程序，并且可以通过网络使用OS X(但是通过LAN使用600GB并不是最快的方法)...
  
  I'm able make complex scripts is BASH and "+-" :) know perl.. Can use FreeBSD/Linux utilities directly on the server and over the network can use OS X (but working with 600GB over the LAN not the fastest way)...
  
  我的主意:
  - 仅在工作流程结束时删除图像
  - use Image::ExifTool脚本，用于基于图像创建日期和相机模型(可能也是其他exif数据)收集重复的图像数据.
  - 对纯图像数据进行校验和(或提取直方图-同一图像应具有相同的直方图)-对此不确定
  - 使用一些相似性检测来基于调整大小和照片增强功能查找重复项-不知道该怎么办...
  - delete images only at the end of workflow
  - use Image::ExifTool script for collecting duplicate image data based on image-creation date, and camera model (maybe other exif data too).
  - make checksum of pure image data (or extract histogram - same images should have the same histogram) - not sure about this
  - use some similarity detection for finding duplicates based on resize and foto enhancement - no idea how to do...
  任何想法，帮助，任何(软件/算法)提示如何在混乱中做出秩序?
  
  Any idea, help, any (software/algorithm) hint how to make order in the chaos?
  
  Ps:
  
  这里几乎是一个相同的问题:查找重复的图像文件，但是我已经完成了答案(md5).并寻找更精确的校验和和图像比较算法.
  
  Here is nearly identical question: Finding Duplicate image files but i'm already done with the answer (md5). and looking for more precise checksuming and image comparing algorithms.
  
  推荐答案
  
  您是否看过 Randal Schwartz的这篇文章?他将Perl脚本与ImageMagick一起使用，以比较图片的调整大小(4x4 RGB网格)版本，然后对其进行比较，以标记相似"图片.
  
  Have you looked at this article by Randal Schwartz? He uses a perl script with ImageMagick to compare resized (4x4 RGB grid) versions of the pictures that he then compares in order to flag "similar" pictures.
  
  这篇关于使用仅比较纯图像数据和图像相似性进行重复照片搜索?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用仅比较纯图像数据和图像相似性进行重复照片搜索? [英] Duplicate photo searching with compare only pure imagedata and image similarity?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用仅比较纯图像数据和图像相似性进行重复照片搜索? [英] Duplicate photo searching with compare only pure imagedata and image similarity?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭