使用索引查找相似图像的算法 [英] Algorithm for finding similar images using an index

查看:193
本文介绍了使用索引查找相似图像的算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有一些令人惊讶的好图像比较工具,即使它们不完全相同(例如,尺寸,壁纸,亮度/对比度的变化),它们也能找到相似的图像。我在这里有一些示例应用程序:

There are some surprisingly good image compare tools which find similar image even if it's not exactly the same (eg. change in size, wallpaper, brightness/contrast). I have some example applications here:

  • Unique Filer 1.4 (shareware): https://web.archive.org/web/20010309014927/http://uniquefiler.com/
  • Fast Duplicate File Finder (Freeware): http://www.mindgems.com/products/Fast-Duplicate-File-Finder/Fast-Duplicate-File-Finder-About.htm
  • Visual similarity duplicate image finder (payware): http://www.mindgems.com/products/VS-Duplicate-Image-Finder/VSDIF-About.htm
  • Duplicate Checker (payware): http://www.duplicatechecker.com/

我只尝试了第一个,但所有这些都是为Windows开发的,不是开源的。 Unique Filer于2000年发布,主页似乎已经消失。它出乎意料地快(即使在当年的计算机上),因为它使用索引并使用索引比较大约10000个图像只需要几秒钟(并且更新索引是一个可扩展的过程)。

I only tried the first one, but all of them are developed for Windows and are not open source. Unique Filer was released in 2000 and the homepage seems to have disappeared. It was surprisingly fast (even on computers from that year) because it used an index and comparing some 10000 images using the index needed only some few seconds (and updating the index was a scalable process).

由于该算法以非常有效的形式存在至少15年,我认为它已有详细记录,可能已经作为开源库实现。有谁知道更多关于使用哪种算法或图像检测理论来实现这些应用程序?也许甚至有一个开源实现可用吗?

Since this algorithm in a very effective form already exists for at least 15 years, I assume it is well-documented and possibly already implemented as an open source library. Does anyone knows more about which algorithm or image detection theory was used to implement this applications? Maybe there is even a open source implementation of it available?

我已经检查了问题用于查找相似图像的算法但所有这些答案通过比较一个图像与另一个图像来解决问题。对于1000多张图像,这将导致1000 ^ 2比较操作,这不是我正在寻找的。

I already checked the question Algorithm for finding similar images but all of it's answers solve the problem by comparing one image to another. For 1000+ images this will result in 1000^2 comparing operations which is just not what I'm looking for.

推荐答案

您描述的问题通常称为最近邻搜索。由于您要求在大型数据集上实现高效率,近似最近邻搜索就是您所追求的。

The problem you are describing is more generally called Nearest Neighbor Search. Since you are asking for high efficiency on large datasets, Approximated Nearest Neighbor Search is what you are after.

一种有效的技术是< a href =http://en.wikipedia.org/wiki/Locality-sensitive_hashing =nofollow>位置敏感哈希(LSH),其中这些幻灯片给出了很好的概述。其基本思想是使用散列函数将所有数据投影到低维空间,其约束条件是类似数据的散列与高概率冲突,不同数据以低概率冲突。这些概率是算法的参数,利用它可以改变准确性和效率之间的权衡。

An efficient technique for this is Locality-Sensitive Hashing (LSH), for which these slides give a great overview. Its basic idea is the use of hashing functions which project all data to a low-dimensional space, with the constraint that the hash of similar data collides with a high probability and dissimilar data collides with low probability. These probabilities are parameters to the algorithm, with which the trade-off between accuracy and efficiency can be changed.

LSHKIT 是LSH的开源实现。

LSHKIT is an open-source implementation of LSH.

这篇关于使用索引查找相似图像的算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆