如何判断一个文件是否是gzip压缩的? [英] How to tell if a file is gzip compressed?

查看:1575
本文介绍了如何判断一个文件是否是gzip压缩的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Python程序,它将文本文件作为输入。但是,其中一些文件可能是gzip压缩的。是否有跨平台,可以从Python的方式来确定一个文件是否是gzip压缩?以下是可靠的,还是一个普通的文本文件意外看起来像gzip一样,我可以得到假阳性?

I have a Python program which is going to take text files as input. However, some of these files may be gzip compressed. Is there a cross-platform, usable from Python way to determine if a file is gzip compressed or not? Is the following reliable or could an ordinary text file 'accidentally' look gzip-like enough for me to get false positives?

try:
    gzip.GzipFile(filename, 'r')
    # compressed
    # ...
except:
    # not compressed
    # ...

感谢,
Ryan

Thanks, Ryan

推荐答案

魔法数字对于gzip压缩文件为 1f 8b 。虽然测试不是100%可靠,但是普通文本文件不可能从这两个字节开始,在UTF-8中它甚至不合法。

The magic number for gzip compressed files is 1f 8b. Although testing for this is not 100% reliable, it is highly unlikely that "ordinary text files" start with those two bytes—in UTF-8 it's not even legal.

通常gzip压缩文件运行后缀 .gz 虽然。即使 gzip(1)本身也不会解压缩文件,除非你 - force 它。你可以想象地使用它,但你仍然必须处理一个可能的IOError(在任何情况下你必须处理)。

Usually gzip compressed files sport the suffix .gz though. Even gzip(1) itself won't unpack files without it unless you --force it to. You could conceivably use that, but you'd still have to deal with a possible IOError (which you have to in any case).

你的方法的一个问题是, gzip.GzipFile()不会抛出异常,如果你喂它一个未压缩的文件。只有稍后 read()才会。这意味着,你可能需要实现一些你的程序逻辑两次。丑陋。

One problem with your approach is, that gzip.GzipFile() will not throw an exception if you feed it an uncompressed file. Only a later read() will. This means, that you would probably have to implement some of your program logic twice. Ugly.

这篇关于如何判断一个文件是否是gzip压缩的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆