如何从Python的zip文件中的zip文件读取? [英] How to read from a zip file within zip file in Python?

查看:598
本文介绍了如何从Python的zip文件中的zip文件读取?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个要读取的文件,该文件本身已压缩在zip存档中.例如,parent.zip包含child.zip,而后者包含child.txt.我在阅读child.zip时遇到了麻烦.谁能纠正我的代码?

I have a file that I want to read that is itself zipped within a zip archive. For example, parent.zip contains child.zip, which contains child.txt. I am having trouble reading child.zip. Can anyone correct my code?

我假定我需要将child.zip创建为类似文件的对象,然后使用zipfile的第二个实例打开它,但是对于python来说,我的zipfile.ZipFile(zfile.open(name))是新手.它会产生一个zipfile.BadZipfile :(独立验证的)child.zip上的文件不是zip文件"

I assume that I need to create child.zip as a file-like object and then open it with a second instance of zipfile, but being new to python my zipfile.ZipFile(zfile.open(name)) is silly. It raises a zipfile.BadZipfile: "File is not a zip file" on (independently validated) child.zip

import zipfile
with zipfile.ZipFile("parent.zip", "r") as zfile:
    for name in zfile.namelist():
        if re.search(r'\.zip$', name) is not None:
            # We have a zip within a zip
            with **zipfile.ZipFile(zfile.open(name))** as zfile2:
                    for name2 in zfile2.namelist():
                        # Now we can extract
                        logging.info( "Found internal internal file: " + name2)
                        print "Processing code goes here"

推荐答案

ZipFile实例上使用.open()调用时,您确实会得到一个打开的文件句柄.但是,要读取一个zip文件,ZipFile类需要更多一点.它需要能够对该文件进行搜索,在您的情况下,.open()返回的对象是不可搜索的.只有Python 3(3.2及更高版本)会生成支持搜索的ZipExFile对象(只要外部zip文件的基础文件句柄是可搜索的,并且没有试图写入ZipFile对象的东西).

When you use the .open() call on a ZipFile instance you indeed get an open file handle. However, to read a zip file, the ZipFile class needs a little more. It needs to be able to seek on that file, and the object returned by .open() is not seekable in your case. Only Python 3 (3.2 and up) produces a ZipExFile object that supports seeking (provided the underlying file handle for the outer zip file is seekable, and nothing is trying to write to the ZipFile object).

解决方法是使用.read()将整个zip条目读取到内存中,将其存储在BytesIO对象(可查找 的内存文件中)中,并将其提供给:

The workaround is to read the whole zip entry into memory using .read(), store it in a BytesIO object (an in-memory file that is seekable) and feed that to ZipFile:

from io import BytesIO

# ...
        zfiledata = BytesIO(zfile.read(name))
        with zipfile.ZipFile(zfiledata) as zfile2:

或者,在您的示例中:

import zipfile
from io import BytesIO

with zipfile.ZipFile("parent.zip", "r") as zfile:
    for name in zfile.namelist():
        if re.search(r'\.zip$', name) is not None:
            # We have a zip within a zip
            zfiledata = BytesIO(zfile.read(name))
            with zipfile.ZipFile(zfiledata) as zfile2:
                for name2 in zfile2.namelist():
                    # Now we can extract
                    logging.info( "Found internal internal file: " + name2)
                    print "Processing code goes here"

这篇关于如何从Python的zip文件中的zip文件读取?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆