图像指纹比较许多图像的相似性 [英] Image fingerprint to compare similarity of many images

查看:147
本文介绍了图像指纹比较许多图像的相似性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要创建许多图像的指纹(大约100.000现有,每天1000新,RGB,JPEG,最大尺寸800x800),以便非常快速地将每个图像与每个其他图像进行比较。我不能使用二进制比较方法,因为也应该识别几乎相似的图像。

I need to create fingerprints of many images (about 100.000 existing, 1000 new per day, RGB, JPEG, max size 800x800) to compare every image to every other image very fast. I can't use binary compare methods because also images which are nearly similar should be recognized.

最好是现有的库,但是对现有算法的一些提示会有所帮助很多。

Best would be an existing library, but also some hints to existing algorithms would help me a lot.

推荐答案

正常散列或CRC计算算法不适用于图像数据。必须考虑信息的维度性质。

Normal hashing or CRC calculation algorithms do not work well with image data. The dimensional nature of the information must be taken into account.

如果你需要非常强大的指纹识别,那么仿射变换(缩放,旋转,平移,翻转)就会被考虑在内,您可以使用 Radon transformation on the图像源用于生成图像数据的标准映射 - 将其与每个图像一起存储,然后仅比较指纹。这是一个复杂的算法,不适合胆小的人。

If you need extremely robust fingerprinting, such that affine transformations (scaling, rotation, translation, flipping) are accounted for, you can use a Radon transformation on the image source to produce a normative mapping of the image data - store this with each image and then compare just the fingerprints. This is a complex algorithm and not for the faint of heart.

可能有一些简单的解决方案:

a few simple solutions are possible:


  1. 为图像创建光度直方图作为指纹

  2. 创建每个图像的缩小版本作为指纹

  3. 组合技术(1)和(2)进入混合方法以提高比较质量

光度直方图(特别是分为RGB的直方图)组件)是图像的合理指纹 - 并且可以非常有效地实现。从另一个直方图中减去一个直方图将生成一个新的历史图,您可以处理该历史图以确定两个图像的相似程度。直方图,因为只评估光度/颜色信息的分布和出现处理仿射变换相当好。如果将每个颜色分量的亮度信息量化为8位值,则768字节的存储空间足以满足几乎任何合理大小的图像的指纹。当操纵图像中的颜色信息时,亮度直方图产生假阴性。如果应用对比度/亮度,海报,色移,光度信息变化等变换。某些类型的图像也可能出现误报...例如风景和图像,其中单一颜色支配其他图像。

A luminosity histogram (especially one that is separated into RGB components) is a reasonable fingerprint for an image - and can be implemented quite efficiently. Subtracting one histogram from another will produce a new historgram which you can process to decide how similar two images are. Histograms, because the only evaluate the distribution and occurrence of luminosity/color information handle affine transformations quite well. If you quantize each color component's luminosity information down to an 8-bit value, 768 bytes of storage are sufficient for the fingerprint of an image of almost any reasonable size. Luminosity histograms produce false negatives when the color information in an image is manipulated. If you apply transformations like contrast/brightness, posterize, color shifting, luminosity information changes. False positives are also possible with certain types of images ... such as landscapes and images where a single color dominates others.

使用缩放图像是另一种减少信息的方法图像密度更容易比较。降低到原始图像大小的10%以下通常会丢失太多的信息 - 因此800x800像素图像可以缩小到80x80,并且仍然提供足够的信息来执行适当的指纹识别。与直方图数据不同,当源分辨率具有不同的宽高比时,您必须对图像数据执行各向异性缩放。换句话说,将300x800图像缩小为80x80缩略图会导致图像变形,这样当与300x500图像(非常相似)相比时,会导致漏报。当涉及仿射变换时,缩略图指纹通常也会产生假阴性。如果您翻转或旋转图像,其缩略图将与原始图像完全不同,可能会导致误报。

Using scaled images is another way to reduce the information density of the image to a level that is easier to compare. Reductions below 10% of the original image size generally lose too much of the information to be of use - so an 800x800 pixel image can be scaled down to 80x80 and still provide enough information to perform decent fingerprinting. Unlike histogram data, you have to perform anisotropic scaling of the image data when the source resolutions have varying aspect ratios. In other words, reducing a 300x800 image into an 80x80 thumbnail causes deformation of the image, such that when compared with a 300x500 image (that's very similar) will cause false negatives. Thumbnail fingerprints also often produce false negatives when affine transformations are involved. If you flip or rotate an image, its thumbnail will be quite different from the original and may result in a false positive.

结合这两种技术是对冲您的缩略图的合理方法下注并减少误报和漏报的发生。

Combining both techniques is a reasonable way to hedge your bets and reduce the occurence of both false positives and false negatives.

这篇关于图像指纹比较许多图像的相似性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆