如何在 Python 2.5 中模拟 ZipFile.open? [英] How to simulate ZipFile.open in Python 2.5?

查看:23
本文介绍了如何在 Python 2.5 中模拟 ZipFile.open?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将 zip 中的文件解压缩到特定路径,而忽略存档中的文件路径.这在 Python 2.6 中很容易(我的文档字符串比代码长)

I want to extract a file from a zip to a specific path, ignoring the file path in the archive. This is very easy in Python 2.6 (my docstring is longer than the code)

import shutil
import zipfile

def extract_from_zip(name, dest_path, zip_file):
    """Similar to zipfile.ZipFile.extract but extracts the file given by name
    from the zip_file (instance of zipfile.ZipFile) to the given dest_path
    *ignoring* the filename path given in the archive completely
    instead of preserving it as extract does.
    """
    dest_file = open(dest_path, 'wb')
    archived_file = zip_file.open(name)
    shutil.copyfileobj(archived_file, dest_file)


 extract_from_zip('path/to/file.dat', 'output.txt', zipfile.ZipFile('test.zip', 'r'))

但在 Python 2.5 中,ZipFile.open 方法不可用.我在 stackoverflow 上找不到解决方案,但是 这个论坛post 有一个很好的解决方案,它利用了 ZipInfo.file_offset 寻找 zip 中的正确点并使用 zlib.decompressobj 从那里解压字节.不幸的是 ZipInfo.file_offset 在 Python 2.5 中被删除了!

But in Python 2.5, The ZipFile.open method is not available. I couldn't find a solution on stackoverflow, but this forum post had a nice solution that makes use of the ZipInfo.file_offset to seek to the right point in the zip and use zlib.decompressobj to unpack the bytes from there. Unfortunately ZipInfo.file_offset was removed in Python 2.5!

因此,鉴于我们在 Python 2.5 中拥有的只是 ZipInfo.header_offset,我想我只需要解析并跳过标题结构即可自己获得文件偏移量.使用维基百科作为参考(我知道)我想出了这么多更长且不是很优雅的解决方案.

So, given that all we have in Python 2.5 is the ZipInfo.header_offset, I figured I'd just have to parse and skip over the header structure to get to the file offset myself. Using Wikipedia as a reference (I know) I came up with this much longer and not very elegant solution.

import zipfile
import zlib

def extract_from_zip(name, dest_path, zip_file):
    """Python 2.5 version :("""
    dest_file = open(dest_path, 'wb')
    info = zip_file.getinfo(name)
    if info.compress_type == zipfile.ZIP_STORED:
        decoder = None
    elif info.compress_type == zipfile.ZIP_DEFLATED:
        decoder = zlib.decompressobj(-zlib.MAX_WBITS)
    else:
        raise zipfile.BadZipFile("Unrecognized compression method")

    # Seek over the fixed size fields to the "file name length" field in
    # the file header (26 bytes). Unpack this and the "extra field length"
    # field ourselves as info.extra doesn't seem to be the correct length.
    zip_file.fp.seek(info.header_offset + 26)
    file_name_len, extra_len = struct.unpack("<HH", zip_file.fp.read(4))
    zip_file.fp.seek(info.header_offset + 30 + file_name_len + extra_len)

    bytes_to_read = info.compress_size

    while True:
        buff = zip_file.fp.read(min(bytes_to_read, 102400))
        if not buff:
            break
        bytes_to_read -= len(buff)
        if decoder:
            buff = decoder.decompress(buff)
        dest_file.write(buff)

    if decoder:
        dest_file.write(decoder.decompress('Z'))
        dest_file.write(decoder.flush())

注意我是如何解压和读取给出额外字段长度的字段的,因为在 ZipInfo.extra 属性上调用 len 会减少 4 个字节,从而导致要错误计算的偏移量.也许我在这里遗漏了什么?

Note how I unpack and read the field that gives the length of the extra field, because calling len on the ZipInfo.extra attribute gives 4 bytes less, thus causing the offset to be calculated incorrectly. Perhaps I'm missing something here?

有人可以改进 Python 2.5 的这个解决方案吗?

Can anyone improve on this solution for Python 2.5?

我应该说,这是 ChrisAdams 建议的明显解决方案

I should have said, the obvious solution as suggested by ChrisAdams

dest_file.write(zip_file.read(name))

对于 zip 中包含的任何合理大小的文件,

将因 MemoryError 而失败,因为它试图一次性将整个文件放入内存中.我有大文件,所以我需要将内容流式传输到磁盘.

will fail with MemoryError for any reasonably sized file contained in the zip because it tries to slurp the whole file into memory in one go. I have large files, so I need to stream out the contents to disk.

此外,升级 Python 是显而易见的解决方案,但我完全无法控制,而且基本上不可能.

Also, upgrading Python is the obvious solution, but one that is entirely out of my hands and essentially impossible.

推荐答案

鉴于我的约束,看起来我的问题给出了答案:自己解析 ZipFile 结构并使用 zlib.decompressobj找到字节后解压缩它们.

Given my constraints, it looks like the answer was given in my question: parse the ZipFile structure yourself and use zlib.decompressobj to unzip the bytes once you've found them.

如果你没有(/遭受)我的限制,你可以在这里找到更好的答案:

If you don't have (/suffer from) my constraints, you can find better answers here:

  1. 如果可以,只需按照 Daenyth 的评论中的建议将 Python 2.5 升级到 2.6(或更高版本!).
  2. 如果 zip 中只有小文件可以 100% 加载到内存中,请使用 克里斯亚当斯的回答
  3. 如果您可以引入对外部实用程序的依赖,请对 /usr/bin/unzip 或类似内容进行适当的系统调用,如 弗拉德的回答
  1. If you can, just upgrade Python 2.5 to 2.6 (or later!), as suggested in a comment by Daenyth.
  2. If you only have small files in the zip which can be 100% loaded in memory, use ChrisAdams' answer
  3. If you can introduce a dependency on an external utility, make an appropriate system call to /usr/bin/unzip or similar, as suggested in Vlad's answer

这篇关于如何在 Python 2.5 中模拟 ZipFile.open?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆