Python gzip拒绝读取未压缩的文件 [英] Python gzip refuses to read uncompressed file

查看:482
本文介绍了Python gzip拒绝读取未压缩的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我似乎记得以前使用Python gzip模块可以透明地读取未压缩的文件.这真的很有用,因为无论是否已压缩,它都允许读取输入文件.您根本不必担心.

I seem to remember that the Python gzip module previously allowed you to read non-gzipped files transparently. This was really useful, as it allowed to read an input file whether or not it was gzipped. You simply didn't have to worry about it.

现在,我得到了IOError异常(在Python 2.7.5中):

Now,I get an IOError exception (in Python 2.7.5):

   Traceback (most recent call last):
  File "tst.py", line 14, in <module>
    rec = fd.readline()
  File "/sw/lib/python2.7/gzip.py", line 455, in readline
    c = self.read(readsize)
  File "/sw/lib/python2.7/gzip.py", line 261, in read
    self._read(readsize)
  File "/sw/lib/python2.7/gzip.py", line 296, in _read
    self._read_gzip_header()
  File "/sw/lib/python2.7/gzip.py", line 190, in _read_gzip_header
    raise IOError, 'Not a gzipped file'
IOError: Not a gzipped file

如果有人有一个巧妙的把戏,我想听听.是的,我知道如何捕获该异常,但是首先读取一行,然后关闭文件并再次打开它,我发现它很笨拙.

If anyone has a neat trick, I'd like to hear about it. Yes, I know how to catch the exception, but I find it rather clunky to first read a line, then close the file and open it again.

推荐答案

最好的解决方案是使用类似 https的方法: //lib.com中的//github.com/ahupp/python-magic .您根本无法避免至少读取标头来识别文件(除非您隐式地信任文件扩展名)

The best solution for this would be to use something like https://github.com/ahupp/python-magic with libmagic. You simply cannot avoid at least reading a header to identify a file (unless you implicitly trust file extensions)

如果您感觉很斯巴达语,则用于标识gzip(1)文件的魔术数字是前两个字节为0x1f 0x8b.

If you're feeling spartan the magic number for identifying gzip(1) files is the first two bytes being 0x1f 0x8b.

In [1]: f = open('foo.html.gz')
In [2]: print `f.read(2)`
'\x1f\x8b'

gzip.open只是GzipFile的包装,您可以有一个像这样的函数,它仅根据源内容返回正确的对象类型,而不必两次打开文件:

gzip.open is just a wrapper around GzipFile, you could have a function like this that just returns the correct type of object depending on what the source is without having to open the file twice:

#!/usr/bin/python

import gzip

def opener(filename):
    f = open(filename,'rb')
    if (f.read(2) == '\x1f\x8b'):
        f.seek(0)
        return gzip.GzipFile(fileobj=f)
    else:
        f.seek(0)
        return f

这篇关于Python gzip拒绝读取未压缩的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆