在 Python 2.7 中使用 zlib 从 url 读取 gzip 文件 [英] Read a gzip file from a url with zlib in Python 2.7

查看:31
本文介绍了在 Python 2.7 中使用 zlib 从 url 读取 gzip 文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从 url 读取 gzip 文件而不在 Python 2.7 中保存临时文件.但是,由于某种原因,我得到了一个截断的文本文件.我花了相当多的时间在网上搜索解决方案,但没有成功.如果我将原始"数据保存回 gzip 文件,则不会被截断(请参阅下面的示例代码).我做错了什么?

I'm trying to read a gzip file from a url without saving a temporary file in Python 2.7. However, for some reason I get a truncated text file. I have spend quite some time searching the net for solutions without success. There is no truncation if I save the "raw" data back into a gzip file (see sample code below). What am I doing wrong?

我的示例代码:

    import urllib2
    import zlib
    from StringIO import StringIO

    url = "ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF/clinvar_00-latest.vcf.gz"

    # Create a opener
    opener = urllib2.build_opener() 

    request = urllib2.Request(url)
    request.add_header('Accept-encoding', 'gzip')

    # Fetch the gzip filer
    respond = opener.open(request)
    compressedData = respond.read()
    respond.close()

    opener.close()

    # Extract data and save to text file
    compressedDataBuf = StringIO(compressedData)
    d = zlib.decompressobj(16+zlib.MAX_WBITS)

    buffer = compressedDataBuf.read(1024)
    saveFile = open('/tmp/test.txt', "wb")
    while buffer:
        saveFile.write(d.decompress(buffer))
        buffer = compressedDataBuf.read(1024)
    saveFile.close()

    # Save "raw" data to new gzip file.
    saveFile = open('/tmp/test.gz', "wb")
    saveFile.write(compressedData)
    saveFile.close()

推荐答案

因为 gzip 文件由许多串联的 gzip 流组成,这是 RFC 1952 允许的.gzip 会自动解压缩所有 gzip 流.

Because that gzip file consists of many concatenated gzip streams, as permitted by RFC 1952. gzip automatically decompresses all of the gzip streams.

您需要检测每个 gzip 流的结尾并使用后续压缩数据重新开始解压.查看 Python 文档中的 unused_data.

You need to detect the end of each gzip stream and restart the decompression with the subsequent compressed data. Look at unused_data in the Python documentation.

这篇关于在 Python 2.7 中使用 zlib 从 url 读取 gzip 文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆