图像指纹来比较许多图像的相似性 [英] Image fingerprint to compare similarity of many images

查看:46
本文介绍了图像指纹来比较许多图像的相似性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要创建许多图像的指纹(大约 100.000 个现有图像,每天 1000 个新图像,RGB、JPEG,最大尺寸 800x800),以便非常快速地将每个图像与其他图像进行比较.我不能使用二进制比较方法,因为也应该识别几乎相似的图像.

I need to create fingerprints of many images (about 100.000 existing, 1000 new per day, RGB, JPEG, max size 800x800) to compare every image to every other image very fast. I can't use binary compare methods because also images which are nearly similar should be recognized.

最好是现有的库,但对现有算法的一些提示也会对我有很大帮助.

Best would be an existing library, but also some hints to existing algorithms would help me a lot.

推荐答案

普通散列或 CRC 计算算法不适用于图像数据.必须考虑信息的维度性质.

Normal hashing or CRC calculation algorithms do not work well with image data. The dimensional nature of the information must be taken into account.

如果您需要极其强大的指纹识别,例如考虑仿射变换(缩放、旋转、平移、翻转),您可以使用 图像源上的氡变换以生成图像数据的规范映射 - 将其与每个图像进行比较只是指纹.这是一个复杂的算法,不适合胆小的人.

If you need extremely robust fingerprinting, such that affine transformations (scaling, rotation, translation, flipping) are accounted for, you can use a Radon transformation on the image source to produce a normative mapping of the image data - store this with each image and then compare just the fingerprints. This is a complex algorithm and not for the faint of heart.

一些简单的解决方案是可能的:

a few simple solutions are possible:

  1. 为图像创建亮度直方图作为指纹
  2. 将每个图像的缩小版本创建为指纹
  3. 将技术 (1) 和 (2) 组合成一种混合方法,以提高比较质量

亮度直方图(尤其是分离为 RGB 分量的直方图)是图像的合理指纹 - 并且可以非常有效地实现.从另一个直方图减去一个直方图将产生一个新的直方图,您可以对其进行处理以确定两个图像的相似程度.直方图,因为只有评估亮度/颜色信息的分布和出现,才能很好地处理仿射变换.如果将每个颜色分量的亮度信息量化为 8 位值,则 768 字节的存储空间足以存储几乎任何合理大小的图像的指纹.当处理图像中的颜色信息时,亮度直方图会产生假阴性.如果您应用对比度/亮度、分色、色移、亮度信息等变换.某些类型的图像也可能出现误报……例如风景和单一颜色占主导地位的图像.

A luminosity histogram (especially one that is separated into RGB components) is a reasonable fingerprint for an image - and can be implemented quite efficiently. Subtracting one histogram from another will produce a new historgram which you can process to decide how similar two images are. Histograms, because the only evaluate the distribution and occurrence of luminosity/color information handle affine transformations quite well. If you quantize each color component's luminosity information down to an 8-bit value, 768 bytes of storage are sufficient for the fingerprint of an image of almost any reasonable size. Luminosity histograms produce false negatives when the color information in an image is manipulated. If you apply transformations like contrast/brightness, posterize, color shifting, luminosity information changes. False positives are also possible with certain types of images ... such as landscapes and images where a single color dominates others.

使用缩放图像是将图像的信息密度降低到更易于比较的水平的另一种方法.缩小到原始图像大小的 10% 以下通常会丢失太多有用的信息 - 因此 800x800 像素的图像可以缩小到 80x80 并且仍然提供足够的信息来执行体面的指纹识别.与直方图数据不同,当源分辨率具有不同的纵横比时,您必须对图像数据执行各向异性缩放.换句话说,将 300x800 的图像缩小为 80x80 的缩略图会导致图像变形,因此与 300x500 的图像(非常相似)相比,会导致假阴性.当涉及仿射变换时,缩略图指纹也经常产生假阴性.如果您翻转或旋转图像,其缩略图将与原始图像大不相同,并可能导致误报.

Using scaled images is another way to reduce the information density of the image to a level that is easier to compare. Reductions below 10% of the original image size generally lose too much of the information to be of use - so an 800x800 pixel image can be scaled down to 80x80 and still provide enough information to perform decent fingerprinting. Unlike histogram data, you have to perform anisotropic scaling of the image data when the source resolutions have varying aspect ratios. In other words, reducing a 300x800 image into an 80x80 thumbnail causes deformation of the image, such that when compared with a 300x500 image (that's very similar) will cause false negatives. Thumbnail fingerprints also often produce false negatives when affine transformations are involved. If you flip or rotate an image, its thumbnail will be quite different from the original and may result in a false positive.

结合这两种技术是对冲您的赌注并减少误报和漏报的合理方法.

Combining both techniques is a reasonable way to hedge your bets and reduce the occurence of both false positives and false negatives.

这篇关于图像指纹来比较许多图像的相似性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆