仅计算图像的核心图像数据(不包括元数据)的哈希 [英] Compute hash of only the core image data (excluding metadata) for an image

查看:55
本文介绍了仅计算图像的核心图像数据(不包括元数据)的哈希的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个脚本来计算除EXIF标签之外的图像的MD5总和.

I'm writing a script to calculate the MD5 sum of an image excluding the EXIF tag.

为了准确地执行此操作,我需要知道EXIF标记在文件中的位置(开头,中间,结尾),以便可以排除它.

In order to do this accurately, I need to know where the EXIF tag is located in the file (beginning, middle, end) so that I can exclude it.

如何确定标签在文件中的位置?

How can I determine where in the file the tag is located?

我正在扫描的图像格式为TIFF,JPG,PNG,BMP,DNG,CR2,NEF,以及一些视频MOV,AVI和MPG.

The images that I am scanning are in the format TIFF, JPG, PNG, BMP, DNG, CR2, NEF, and some videos MOV, AVI, and MPG.

推荐答案

一种简单的实现方法是对核心图像数据进行哈希处理.对于PNG,您可以通过仅计算关键块"(即以大写字母开头的块)来进行此操作. JPEG具有类似但更简单的文件结构.

One simple way to do it is to hash the core image data. For PNG, you could do this by counting only the "critical chunks" (i.e. the ones starting with capital letters). JPEG has a similar but simpler file structure.

ImageMagick中的可视哈希在对图像进行哈希处理时对其进行解压缩.就您而言,您可以立即对压缩的图像数据进行哈希处理,因此(如果正确实施),它应与对原始文件进行哈希处理一样快.

The visual hash in ImageMagick decompresses the image as it hashes it. In your case, you could hash the compressed image data right away, so (if implemented correctly) a it should be just as quick as hashing the raw file.

这是一个小的Python脚本,说明了这个想法.它可能对您不起作用,但至少应该表明我的意思:)

This is a small Python script illustrating the idea. It may or may not work for you, but it should at least give an indication to what I mean :)

import struct
import os
import hashlib

def png(fh):
    hash = hashlib.md5()
    assert fh.read(8)[1:4] == "PNG"
    while True:
        try:
            length, = struct.unpack(">i",fh.read(4))
        except struct.error:
            break
        if fh.read(4) == "IDAT":
            hash.update(fh.read(length))
            fh.read(4) # CRC
        else:
            fh.seek(length+4,os.SEEK_CUR)
    print "Hash: %r" % hash.digest()

def jpeg(fh):
    hash = hashlib.md5()
    assert fh.read(2) == "\xff\xd8"
    while True:
        marker,length = struct.unpack(">2H", fh.read(4))
        assert marker & 0xff00 == 0xff00
        if marker == 0xFFDA: # Start of stream
            hash.update(fh.read())
            break
        else:
            fh.seek(length-2, os.SEEK_CUR)
    print "Hash: %r" % hash.digest()


if __name__ == '__main__':
    png(file("sample.png"))
    jpeg(file("sample.jpg"))

这篇关于仅计算图像的核心图像数据(不包括元数据)的哈希的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆