如何使用其内容识别图像文件格式? [英] How to recognize an image file format using its contents?

查看:255
本文介绍了如何使用其内容识别图像文件格式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果图像文件的格式为.png,则它将在文件开头包含 ‰PNG . (当以 Text 模式阅读时)

If a Image file is of format .png then it will contain ‰PNG, at the beginning of the file. (when read in Text mode)

如果图像文件的格式为.bmp,则它将在文件开头包含 BM . (当以 Text 模式阅读时)

If a Image file is of format .bmp then it will contain BM, at the beginning of the file. (when read in Text mode)

我知道图像格式在文件的开头包含一定大小(字节)的文本(数据),用作文本文件的元数据吗?

I know that Image formats contain text (data) of certain size (bytes) in the beginning of the file, which is used as metadata of the Image file?

我的问题是:-

  • 在所有图像文件格式(或一般格式)中,此行为是否相同?
  • 仅使用此数据就可以识别图像文件(没有扩展名)吗?

是否存在有关如何分解此元数据的信息?我的意思是说,元数据中位于哪个位置的数据具有什么含义?

Is there information available on how this metadata is broken down? By that I mean, data at which position in the metadata has what meaning?

推荐答案

在所有图像文件格式(或 一般)?

Is this behavior same in all image file formats (or formats in general)?

对于大多数人来说,是的.有些专有格式(例如游戏)的元数据可能非常短或没有.另外,元数据可能在另一个文件中(例如动画和XML元数据).

For most of them, yes. There are some proprietary formats (e.g. for games) that might have very short or no metadata. Also, metadata might be in another file (e.g. animations together with XML metadata).

仅使用此方法就可以识别图像文件(无扩展名) 数据?

Could a image file (of no extension) be recognized just using this data?

是的.实际上,如果图像文件的扩展名不正确,大多数图像查看器都会警告您,并询问您是否应该对其进行修复.

Yes. In fact, most image viewers will warn you if an image file has an incorrect extension and ask you if they should fix it.

在Unix系统上,有一个 file命令可以识别基于文件的文件在他们的元数据上.有一个专门针对图像的更好的工具,称为 identify(ImageMagick的一部分),该工具可返回更多详细信息分辨率,位深等方面的信息.

On Unix systems, there's a file command that identifies files based on their metadata. There is a better tool specific for images called identify (part of ImageMagick) that returns more detailed information on resolution, bitdepth, etc.

是否存在有关如何分解此元数据的信息?经过 我的意思是,元数据中哪个位置的数据具有什么含义?

Is there information available on how this metadata is broken down? By that I mean, data at which position in the metadata has what meaning?

有关于(图像)文件格式的书籍,对于大多数格式,此信息可在官方规范中获得(例如,图像文件格式的维基百科列表.

There are books about (image) file formats and for most formats, this information is available in official specifications (e.g. RFC 2083 for PNG). They list all of the (optional) file contents, describe the compressions and what a viewer/decoder/encoder can/must/should do with the data. A good starting point might be the Wikipedia list of image file formats.

请注意,基于您给出的示例,我想您是使用文本编辑器打开文件的,而文本编辑器并不是该任务的理想工具.为此,最好使用十六进制编辑器.默认情况下,文本编辑器不会显示大多数字节(例如255),而会解释其他字节(例如,制表符或换行符).它们可能足以显示神奇的文本字符串,例如"BM"和"PNG",但是使用十六进制编辑器,您既可以看到这些文本部分,也可以看到它们的数字表示形式,例如允许您提取图像的宽度和高度.为此,一些将十六进制值转换为十进制的工具很有用,大多数计算器都可以做到这一点.

Note that based on the examples you gave I suppose you opened files with a text editor which is not the ideal tool for that task. It's better to use a hex-editor for this. Text editors won't show most bytes (e.g. 255) by default and interprete others (e.g. tab or line feed). They might be good enough to see magic text strings like "BM" and "PNG", but with a hex editor, you can see both these text parts and their numerical representation - e.g. allowing you to extract image width and height. For this, some tool to convert hexademical values to decimal is useful, most calculators can do this.

作为示例,让我们在文本编辑器和十六进制编辑器中查看分辨率为6146 x 14293的PNG文件的开头:

As an example, let's look at the beginning of a PNG file with a resolution of 6146 x 14293 in both a text editor and a hex editor:

您都可以看到文件都是PNG图像,这是正确的.但是十六进制编辑器视图中标记的部分将显示图像的宽度和高度(与"IHDR"部分的PNG块规范)-0x00001802是十进制的6146,0x000037D5是14293.在文本编辑器中无法做到这一点.

You can see that the file is a PNG image in both of them, that's correct. But the marked part in the hex editor view will show the width and height of the image (matching the PNG chunk specification of the "IHDR" part) - 0x00001802 is 6146 in decimal, 0x000037D5 is 14293. There's no way to do this in the text editor.

还请注意,即使您不知道图像格式,也可能只是猜测它是未压缩的数据而感到幸运(这通常适用于某些游戏图像文件格式,其中最著名的是Unity的资产").例如.如果将文件重命名为".raw",则图像查看器 IrfanView 会为您提供一个对话框(请参见屏幕快照下方),您可以在其中猜测图像的宽度,高度和位深度,并查看结果是否看起来不错.但是,这需要一些解释结果的经验,如果宽度和位深不匹配,则图像看起来像是杂色,扭曲或颜色错误.

Also note that even if you don't know an image format, you might be lucky with just guessing it's uncompressed data (this often works for some game image file formats, most notable Unity's "assets"). E.g. if you rename files to ".raw", the image viewer IrfanView will give you a dialog (see the screenshot below) where you can guess width, height and bit depth of the image and see if the result looks good. This requires some experience in interpreting the outcome though, if width and bitdepth don't match, images will look like noise, warped, or have wrong colors.

可以通过尝试不同的宽度并计算两条线之间的相关系数来改进/自动进行这种图像几何猜测".工具 raw2tiff 可以做到这一点.来自网站的报价:

This "image geometry guessing" can be improved/automated by trying different widths and computing the correlation coefficent between two lines. The tool raw2tiff can do this. Quote from the site:

没有魔术,它只是一个数学统计,所以它可以是 在某些情况下是错误的.但是对于大多数普通图像来说,猜测方法会 工作正常.

There is no magic, it is just a mathematical statistics, so it can be wrong in some cases. But for most ordinary images guessing method will work fine.

这篇关于如何使用其内容识别图像文件格式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆