好的方法来识别类似的图像? [英] Good way to identify similar images?

查看:195
本文介绍了好的方法来识别类似的图像?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在PHP中开发了一个简单快速的算法来比较图像的相似性。

I've developed a simple and fast algorithm in PHP to compare images for similarity.

它的快速(800x600图像约40秒)未优化的搜索算法可以在22分钟内经历3000个图像,将每个图像与其他图像(3 /秒)进行比较。

Its fast (~40 per second for 800x600 images) to hash and a unoptimised search algorithm can go through 3,000 images in 22 mins comparing each one against the others (3/sec).

基本概述是获得图像, 8x8,然后将这些像素转换为HSV。

The basic overview is you get a image, rescale it to 8x8 and then convert those pixels for HSV. The Hue, Saturation and Value are then truncated to 4 bits and it becomes one big hex string.

比较图像基本上沿着两个字符串,然后添加它找到的差异。如果总数低于64,那么它的相同的图像。不同的图像通常在600 - 800左右,低于20,非常相似。

Comparing images basically walks along two strings, and then adds the differences it finds. If the total number is below 64 then its the same image. Different images are usually around 600 - 800. Below 20 and extremely similar.

这个模型有什么改进可以使用吗?
我没有看到不同的组件(色调,饱和度和值)与比较的相关性。 Hue可能很重要,但其他人?

Are there any improvements upon this model I can use? I havent looked at how relevant the different components (hue, saturation and value) are to the comparison. Hue is probably quite important but the others?

为了加快搜索速度,我可能将每个部分的4位分成两半,并将最高有效位放在首位,如果他们失败的检查,然后lsb不需要检查。

To speed up searches I could probably split the 4 bits from each part in half, and put the most significant bits first so if they fail the check then the lsb doesnt need to be checked at all. I dont know a efficient way to store bits like that yet still allow them to be searched and compared easily.

我一直在使用3,000张照片的数据集(大部分是独一无二的) )和没有任何假阳性。

I've been using a dataset of 3,000 photos (mostly unique) and there havent been any false positives. Its completely immune to resizes and fairly resistant to brightness and contrast changes.

推荐答案

您想要使用的是:


  1. 特征提取

  2. 散列

  3. li>
  1. Feature extraction
  2. Hashing
  3. Locally aware bloom hashing.








  1. 大多数人使用 SIFT 特性,虽然我有更好的经验,而不是尺度不变的。基本上,你使用边缘检测器找到有趣的点,然后将这些点的中心你的图像补丁。

  1. Most people use SIFT features, although I've had better experiences with not scale-invariant ones. Basically you use an edge detector to find interesting points and then center your image patches around those points. That way you can also detect sub-images.

你实现的是一个哈希方法。 )

What you implemented is a hash method. There's tons to try from, but yours should work fine :)

使它快速的关键步骤是散列你的哈希。您将值转换为一元表示,然后将位的随机子集作为新的哈希。这样做与20-50随机样本,你得到20-50哈希表。如果任何功能匹配那些50个散列表中的2个或更多,该功能将非常类似于您已经存储的功能。这允许你转换abs(xy)

The crucial step to making it fast is to hash your hashes. You convert your values into unary representation and then take a random subset of the bits as the new hash. Do that with 20-50 random samples and you get 20-50 hash tables. If any feature matches 2 or more out of those 50 hash tables, the feature will be very similar to one you already stored. This allows you to convert the abs(x-y)

希望它有帮助,如果你想试用我自己开发的图像相似性搜索,在hajo在 spratpix 发送邮件

Hope it helps, if you'd like to try out my self-developed image similarity search, drop me a mail at hajo at spratpix

这篇关于好的方法来识别类似的图像?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆