快速,简单的图像哈希算法 [英] Fast and simple image hashing algorithm

查看:163
本文介绍了快速,简单的图像哈希算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要一个(preferably简单和快速)的图像哈希算法。散列值用于在查找表中,而不是对加密

I need a (preferably simple and fast) image hashing algorithm. The hash value is used in a lookup table, not for cryptography.

一些图像是计算机图形 - 即纯色填充rects,栅格化文字等,而也有摄影图像 - 含有丰富的色彩光谱,多为光滑,有合理的噪声振幅

Some of the images are "computer graphic" - i.e. solid-color filled rects, rasterized texts and etc., whereas there are also "photographic" images - containing rich color spectrum, mostly smooth, with reasonable noise amplitude.

我也喜欢散列算法,以便能够被应用到特定的图像部分。我的意思是,该图像可以被划分成的网格单元,并且每个单元的散列函数应该仅取决于该单元的内容。使人们可以发现快,如果两个图像具有的公共区域(如果他们适当对齐)。

I'd also like the hashing algorithm to be able to be applied to specific image parts. I mean, the image can be divided into a grid cells, and the hash function of each cell should depend only on the contents of this cell. So that one may spot quickly if two images have common areas (in case they're aligned appropriately).

注意:我只需要知道,如果两个图像(或其部分)的相同。也就是说,我并不需要类似的图像匹配,就没有必要在特征识别,关联和其它DSP技术。

Note: I only need to know if two images (or their parts) are identical. That is, I don't need to match similar images, there's no need in feature recognition, correlation, and other DSP techniques.

我不知道什么是preferred散列算法。

I wonder what is the preferred hashing algorithm.

有关照片的图像只是一个网格单元内的异或所有的像素是确定更多或更少。对于不同的图像中的相同的散列值的概率是pretty的低,特别是由于(近白色)噪声的presence中断所有潜在对称。加这样的散列函数的频谱看起来不错(任何值是可能的几乎相同的概率)。

For "photographic" images just XOR-ing all the pixels within a grid cell is ok more-or-less. The probability of the same hash value for different images is pretty low, especially because the presence of the (nearly white) noise breaks all the potential symmetries. Plus the spectrum of such a hash function looks good (any value is possible with nearly the same probability).

但是这样的幼稚算法可以不与人为的图形使用。相同的像素,重复图案,几何偏移不变性对于这样的图像非常普遍。异或所有像素会给0与偶数个同一象素的任何图像。

But such a naive algorithm may not be used with "artificial" graphics. Identical pixels, repeating patterns, geometrical offset invariance are very common for such images. XOR-ing all the pixels will give 0 for any image with even number of identical pixels.

使用类似CRT-32看起来有点前途,但我想计算出的东西更快。我想到了迭代公式,每一个新的像素变异当前的哈希值,像这样的:

Using something like CRT-32 looks somewhat promising, but I'd like to figure-out something faster. I thought about iterative formula, each new pixel mutates the current hash value, like this:

hashValue = (hashValue * /*something*/ | newPixelValue) % /* huge prime */

做模素数,也许应该提供一个良好的分散性,使我倾向于此选项。不过,我想知道是否有更好的varians。

Doing modulo prime number should probably give a good dispersion, so that I'm leaning toward this option. But I'd like to know if there are better varians.

在此先感谢。

推荐答案

如果你想非常快,你应该考虑像素的随机子集,以避免读取整个图像。接着,计算对值中的那些像素的顺序散列函数。的随机子集应可通过与固定的种子,使相同图像产生相同的子集,因此相同散列值的一个确定的伪随机数发生器来选择。

If you want to make it very fast, you should consider taking a random subset of the pixels to avoid reading the entire image. Next, compute a hash function on the sequence of values at those pixels. The random subset should be selected by a deterministic pseudo-random number generator with fixed seed so that identical images produce identical subsets and consequently identical hash values.

这甚至应该进行人工图像合理的工作。但是,如果有一个彼此由少数像素的不同图像,这是要给哈希冲突​​。更多的迭代提供更好的可靠性。如果是这样的情况下,例如,如果你的图像中设置很可能有对与一个不同的像素,则必须读的每个像素来计算哈希值。以伪随机系数的简单线性组合就足够好,即使人工图像。

This should work reasonably well even for artificial images. However, if you have images which differ from each other by a small number of pixels, this is going to give hash collisions. More iterations give better reliability. If that is the case, for instance, if your images set is likely to have pairs with one different pixel, you must read every pixel to compute the hash value. Taking a simple linear combination with pseudo-random coefficients would be good enough even for artificial images.

伪code的一个简单的算法

pseudo-code of a simple algorithm

Random generator = new generator(2847)  // Initialized with fixed seed
int num_iterations = 100

int hash(Image image) {
    generator.reset()   //To ensure consistency on each evaluation
    int value = 0
    for num_iteration steps {
        int nextValue = image.getPixel(generator.nextInt()%image.getSize()).getValue()
        value = value + nextValue*generator.nextInt()
    }
    return value
}

这篇关于快速,简单的图像哈希算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆