从PNG图像中提取元数据 [英] Metadata extraction from PNG images

查看:1694
本文介绍了从PNG图像中提取元数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何从此网站等图像中提取元数据?我使用过exev2库,但与本网站相比,它只提供有限的数据。有没有更高级的库?

How to extract metadata from a image like this website? I have used exev2 library but it gives only limited data as compared to this website. Is there some more advanced library?

我已经尝试过hacoir-metadata Python库。

I have already tried hacoir-metadata Python library.

另外怎么做Windows提取图像的细节(我们从属性中看到的)?

Also how does Windows extract details of image (the one we see from properties)?

推荐答案

PNG文件由块组成,其中大部分都是是IDAT块,其包含平均PNG中的压缩像素数据。所有PNG都以IHDR块开头,以IEND块结束。由于PNG是一种非常灵活的标准,因此可以通过组合新类型的块来扩展它 - 这就是动画PNG的工作原理。所有浏览器都可以看到第一帧,但是了解APNG中使用的块类型的浏览器可以看到动画。

PNG files are made up of blocks, most of which are IDAT blocks which contain compressed pixel data in an average PNG. All PNG's start with a IHDR block and end with an IEND block. Since PNG is a very flexible standard in this way, it can be extended by making up new types of blocks--this is how animated Animated PNG works. All browsers can see the first frame, but browsers which understand the types of blocks used in APNG can see the animation.

文本数据可以存在多个地方PNG图像,甚至更多的地方元数据都可以存在。 这是一个非常方便的摘要。你提到了描述标签,它可以只存在于文本块中,因此我将关注它。

There are many places that text data can live in a PNG image, and even more places metadata can live. Here is a very convenient summary. You mentioned the "Description tag", which can only live in text blocks, so that it was I'll be focusing on.

PNG标准包含三种不同类型的文本块: tEXt (Latin-1编码,未压缩), zTXt (压缩,也是Latin-1),最后是 iTXt ,这是三者中最有用的,因为它可以包含UTF-8编码的文本,可以压缩或解压缩。

The PNG standard contains three different types of text blocks: tEXt (Latin-1 encoded, uncompressed), zTXt (compressed, also Latin-1), and finally iTXt, which is the most useful of all three as it can contain UTF-8 encoded text and can either be compressed or decompressed.

因此,您的问题变成了提取文本块的简便方法是什么?

So, your question becomes, "what is a convenient way to extract the text blocks?"

起初,我认为pypng可以做到这一点,但它不能

At first, I thought pypng could do this, but it cannot:


tEXt / zTXt / iTXt

阅读时忽略。没有生成。

Ignored when reading. Not generated.

幸运的是,Pillow对此有所支持 - 幽默地它只是在您提出原始问题的前一天才添加

Luckily, Pillow has support for this - humorously it was added only one day before you asked your original question!

所以,没有进一步的ado,让我们找一个包含iTXt块的图像:这个例子应该这样做。

So, without further ado, let's find an image containing an iTXt chunk: this example ought to do.

>>> im = Image.open('/tmp/itxt.png')
>>> im.info 
{'interlace': 1, 'gamma': 0.45455, 'dpi': (72, 72), 'Title': 'PNG', 'Author': 'La plume de ma tante'}

根据源代码, tEXt zTXt 也包括在内。

According to the source code, tEXt and zTXt are also covered.

对于更一般的情况,查看其他读者,JPEG和GIF也似乎也很好地覆盖了这些格式 - 所以我会推荐PIL。这并不是说 hacoir-metadata 的维护者不会喜欢添​​加文本块支持的拉取请求! : - )

For the more general case, looking over the other readers, the JPEG and GIF ones also seem to have good coverage of those formats as well - so I would recommend PIL for this. That's not to say that the maintainers of hacoir-metadata wouldn't appreciate a pull request adding text block support though! :-)

这篇关于从PNG图像中提取元数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆