Python LZMA:压缩数据在到达流结束标记之前结束 [英] Python LZMA : Compressed data ended before the end-of-stream marker was reached

查看:355
本文介绍了Python LZMA:压缩数据在到达流结束标记之前结束的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用内置的lzma python来解码压缩的数据块.根据数据块,我得到以下异常:

I am using the built in lzma python to decode compressed chunk of data. Depending on the chunk of data, I get the following exception :

Compressed data ended before the end-of-stream marker was reached

数据未损坏.可以使用其他工具正确解压缩它,因此它一定是库中的错误.还有其他人遇到相同的问题:

The data is NOT corrupted. It can be decompressed correctly with other tools, so it must be a bug in the library. There are other people experiencing the same issue:

  • http://bugs.python.org/issue21872
  • https://github.com/peterjc/backports.lzma/issues/6
  • Downloading large file in python error: Compressed file ended before the end-of-stream marker was reached

不幸的是,似乎没有人找到解决方案.至少可以在Python 3.5上使用.

Unfortunately, none seems to have found a solution yet. At least, one that works on Python 3.5.

我该如何解决这个问题?有什么解决方法吗?

How can I solve this problem? Is there any work around?

推荐答案

我花了很多时间试图理解和解决这个问题,所以我认为分享它是一个好主意.问题似乎是由于未正确设置EOF字节的大量数据引起的.为了解压缩缓冲区,我曾经使用lzma python lib提供的lzma.decompress.但是,此方法希望每个数据缓冲区包含一个EOF字节,否则它将引发LZMAError异常.

I spent a lot of time trying to understand and solve this problem, so i thought it would a good idea to share it. The problem seems to be caused by the a chunk of data without the EOF byte properly set. In order to decompress a buffer, I used to use the lzma.decompress provided by the lzma python lib. However, this method expects each data buffer to contains a EOF bytes, otherwise it throws a LZMAError exception.

要解决此限制,我们可以实现一个替代的解压缩功能,该功能使用LZMADecompress对象从缓冲区提取数据.例如:

To work around this limitation, we can implement an alternative decompress function which uses LZMADecompress object to extract the data from a buffer. For example:

def decompress_lzma(data):
    results = []
    while True:
        decomp = LZMADecompressor(FORMAT_AUTO, None, None)
        try:
            res = decomp.decompress(data)
        except LZMAError:
            if results:
                break  # Leftover data is not a valid LZMA/XZ stream; ignore it.
            else:
                raise  # Error on the first iteration; bail out.
        results.append(res)
        data = decomp.unused_data
        if not data:
            break
        if not decomp.eof:
            raise LZMAError("Compressed data ended before the end-of-stream marker was reached")
    return b"".join(results)

此功能类似于标准lzma lib提供的功能,但有一个关键区别.如果已处理完整个缓冲区,则循环会中断,请之前检查我们是否达到了EOF标记.

This function is similar to the one provided by the standard lzma lib with one key difference. The loop is broken if the entire buffer has been processed, before checking if we reached the EOF mark.

我希望这对其他人有用.

I hope this can be useful to other people.

这篇关于Python LZMA:压缩数据在到达流结束标记之前结束的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆