如何检查文件是/不是没有加载完整文件的图像?有一个图像标题阅读库吗? [英] How to check if file is/isn't an image without loading full file? Is there an image header-reading library?

查看:197
本文介绍了如何检查文件是/不是没有加载完整文件的图像?有一个图像标题阅读库吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

编辑:



对不起,我想我的问题是模糊的。我想有一种方法来检查一个文件是不是一个图像,而不浪费时间加载整个图像,因为那么我可以稍后做其余的加载。我不想只检查文件扩展名。



应用程序只是查看图像。通过检查有效性,我的意思是检测并跳过目录中的非图像文件。如果像素数据损坏,我想仍然把它作为一个图像。



我分配页码并配对这些图像。一些图像是单个左侧或右侧页面。有些图像很宽,是左右页面的扩散。例如,pagesAt(3)和pagesAt(4)可以返回相同的std :: pair图像或同一宽图像的std :: pair。有时候,有一些稀薄的图像,第一个图像是独立显示的,类似于一个宽的图像。一个例子是一个封面。



不知道目录中的哪些文件是非图像意味着我不能自信地分配这些页码并配对文件用于显示。另外,用户可以决定跳转到第X页,当我以后发现并删除非图像文件并相应地重新分配页码时,页面X可能看起来是不同的图像。



原始

如果有问题,我使用Qt库中的c ++和QImage。



我遍历一个目录并使用图像路径上的QImage构造函数。这当然非常慢,使应用程序感觉不到响应。但是,它允许我检测无效的图像文件,并在早期忽略它们。

我只能保存图像的路径,而通过目录并实际加载他们只有当他们需要,但后来我不知道如果图像无效或不。



我正在考虑做这两个组合。即,在遍历目录的同时,只读取图像的标题以检查有效性,然后在需要时加载图像数据。

所以,

只需加载图像头比加载整个图像快得多?或者正在做一些I / O读取标题意味着我不妨完成加载图像完整?稍后,我还将解压缩来自存档的图像,所以这也适用于解压只是标题与解压缩整个文件。



另外,我不知道如何加载/读取 图片标题。有一个图书馆,只能读取图像的标题?否则,我将不得不打开每个文件作为一个流,并为我自己的所有文件类型代码图像标题读取器。

解决方案

Unix 文件工具(几乎是永远存在的)完全是这样的。这是一个简单的工具,使用已知文件头和二进制签名的数据库来识别文件的类型(并可能提取一些简单的信息)。

数据库是使用简单的结构化格式(记录在 man magic 中)一个简单的文本文件(用于编译效率),它描述了大量的二进制文件格式。源代码位于 / usr / share / file / magic (在Ubuntu中)。例如,PNG文件格式的条目如下所示:

  0 string \x89PNG\x0d\x0a\\ \\ x1a\x0a PNG图片
!:mime图片/ png
> 16属于x \ b,%ld x
> 20属于x%ld,
> ; 24字节x%d位
> 25字节0灰度,
> 25字节2 \b / color RGB,
> 25字节3色图,
> 25字节4灰度+ alpha,
> 25字节6 \ b / color RGBA,
> 28字节0非隔行
> 28字节1隔行

您可以仅为图像文件类型提取签名,并构建自己的嗅探器,甚至可以使用解析器从文件工具(其中似乎是BSD许可的)。


edit:

Sorry, I guess my question was vague. I'd like to have a way to check if a file is not an image without wasting time loading the whole image, because then I can do the rest of the loading later. I don't want to just check the file extension.

The application just views the images. By 'checking the validity', I meant 'detecting and skipping the non-image files' also in the directory. If the pixel data is corrupt, I'd like to still treat it as an image.

I assign page numbers and pair up these images. Some images are the single left or right page. Some images are wide and are the "spread" of the left and right pages. For example, pagesAt(3) and pagesAt(4) could return the same std::pair of images or a std::pair of the same wide image.

Sometimes, there is an odd number of 'thin' images, and the first image is to be displayed on its own, similar to a wide image. An example would be a single cover page.

Not knowing which files in the directory are non-images means I can't confidently assign those page numbers and pair up the files for displaying. Also, the user may decide to jump to page X, and when I later discover and remove a non-image file and reassign page numbers accordingly, page X could appear to be a different image.

original:

In case it matters, I'm using c++ and QImage from the Qt library.

I'm iterating through a directory and using the QImage constructor on the paths to the images. This is, of course, pretty slow and makes the application feel unresponsive. However, it does allow me to detect invalid image files and ignore them early on.

I could just save only the paths to the images while going through the directory and actually load them only when they're needed, but then I wouldn't know if the image is invalid or not.

I'm considering doing a combination of these two. i.e. While iterating through the directory, reading only the headers of the images to check validity and then load image data when needed.

So,

Will just loading the image headers be much faster than loading the whole image? Or is doing a bit of i/o to read the header mean I might as well finish off loading image in full? Later on, I'll be uncompressing images from archives as well, so this also applies to uncompressing just the header vs uncompressing the whole file.

Also, I don't know how to load/read just the image headers. Is there a library that can read just the headers of images? Otherwise, I'd have to open each file as a stream and code image header readers for all the filetypes on my own.

解决方案

The Unix file tool (which has been around since almost forever) does exactly this. It is a simple tool that uses a database of known file headers and binary signatures to identify the type of the file (and potentially extract some simple information).

The database is a simple text file (which gets compiled for efficiency) that describes a plethora of binary file formats, using a simple structured format (documented in man magic). The source is in /usr/share/file/magic (in Ubuntu). For example, the entry for the PNG file format looks like this:

0       string          \x89PNG\x0d\x0a\x1a\x0a         PNG image
!:mime  image/png
>16     belong          x               \b, %ld x
>20     belong          x               %ld,
>24     byte            x               %d-bit
>25     byte            0               grayscale,
>25     byte            2               \b/color RGB,
>25     byte            3               colormap,
>25     byte            4               gray+alpha,
>25     byte            6               \b/color RGBA,
>28     byte            0               non-interlaced
>28     byte            1               interlaced

You could extract the signatures for just the image file types, and build your own "sniffer", or even use the parser from the file tool (which seems to be BSD-licensed).

这篇关于如何检查文件是/不是没有加载完整文件的图像?有一个图像标题阅读库吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆