PIL图像的简单哈希 [英] Simple hash of PIL image

查看:199
本文介绍了PIL图像的简单哈希的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将PIL图像的信息存储在键值存储中.为此,我对图像进行哈希处理并将哈希值用作键.

I want to store information of PIL images in a key-value store. For that, I hash the image and use the hash as a key.

我一直在使用以下代码来计算哈希值:

I have been using the following code to compute the hash:

def hash(img):
   return hashlib.md5(img.tobytes()).hexdigest()

但是这似乎不稳定.我还没有弄清楚为什么,但是对于不同机器上的同一张图片,我会得到不同的哈希值.

But it seems like this is not stable. I have not figured out why, but for the same image on different machines, I get different hashes.

是否有一种简单的仅依赖于图像本身的散列图像的方式(不依赖于时间戳,系统架构等)?

Is there a simple way of hashing images that only depends on the image itself (not on timestamps, system architecture, etc.)?

请注意,我不需要类似的图像即可获得类似/相同的哈希值,例如图像哈希.实际上,我希望不同的图像具有不同的哈希值,例如更改图像的亮度应更改其哈希值.

Note that I do not need similar images to get a similar/same hash, as in image hashing. In fact, I want different images to have a different hash, e.g. changing the brightness of the image should change its hash.

推荐答案

我猜您的目标是在Python中执行图像哈希处理(与经典哈希处理有很大不同,因为图像的字节表示形式取决于格式,分辨率等)

I'm guessing your goal is to perform image hashing in Python (which is much different than classic hashing, since byte representation of images is dependent on format, resolution and etc.)

图像散列技术之一是平均散列.确保这不是100%准确,但在大多数情况下都能正常工作.

One of the image hashing techniques would be average hashing. Make sure that this is not 100% accurate, but it works fine in most of the cases.

首先,我们通过减小图像的大小和颜色来简化图像,降低图像的复杂性极大地有助于其他图像之间的比较精度:

First we simplify the image by reducing its size and colors, reducing complexity of the image massively contributes to accuracy of comparison between other images:

缩小尺寸:

img = img.resize((10, 10), Image.ANTIALIAS)

减少颜色:

img = img.convert("L")

然后,我们找到图像的平均像素值(显然,它是平均哈希的主要组成部分之一):

Then, we find average pixel value of the image (which is obviously one of the main components of the average hashing):

pixel_data = list(img.getdata())
avg_pixel = sum(pixel_data)/len(pixel_data)

最后计算哈希,我们将图像中的每个像素与平均像素值进行比较.如果pixel大于或等于平均像素,则得到1,否则为0.然后将这些位转换为以16为底的表示形式:

Finally hash is computed, we compare each pixel in the image to the average pixel value. If pixel is more than or equal to average pixel then we get 1, else it is 0. Then we convert these bits to base 16 representation:

bits = "".join(['1' if (px >= avg_pixel) else '0' for px in pixel_data])
hex_representation = str(hex(int(bits, 2)))[2:][::-1].upper()

如果要将此图像与其他图像进行比较,请执行上述操作,并找到平均散列的图像的十六进制表示形式之间的相似性.您可以使用诸如汉明距离之类的简单方法,也可以使用诸如余弦相似度等.

If you want to compare this image to other images, you perform actions above, and find similarity between hexadecimal representation of average hashed images. You can use something as simple as hamming distance or more complex algorithms such as Levenshtein distance, Ratcliff/Obershelp pattern recognition (SequenceMatcher), Cosine Similarity etc.

这篇关于PIL图像的简单哈希的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆