zlib在python中解压缩 [英] zlib decompression in python

查看:1036
本文介绍了zlib在python中解压缩的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好吧,所以我有一些数据流压缩的python的(2.6)zlib.compress()函数。当我尝试解压缩它们,其中一些将不会解压缩(zlib错误-5,这似乎是一个缓冲区错误,不知道该做什么)。起初,我想我做完了,但我意识到,所有的那些我无法解压缩开始与0x78DA(工作的是0x789C),我环顾四周,它似乎是一种不同的zlib压缩 - 幻数根据所使用的压缩而改变。我可以使用什么来解压缩文件?

解决方案

根据 RFC 1950 ,OK0x789C和bad0x78DA之间的差异在FLEVEL位字段中:

  FLEVEL(压缩级别)
这些标志可供特定压缩
方法使用。 deflate方法(CM = 8)将这些标志设置为


0 - 压缩机使用的最快算法
1 - 压缩器使用快速算法
2 - 压缩器使用默认算法
3 - 压缩器使用最大压缩,最慢算法

解压缩不需要FLEVEL中的信息; it
是表示重新压缩是否值得的。

OK使用2,bad使用3.因此本身的差异不是问题



要进一步获取任何信息,您可以考虑为压缩和(尝试)解压缩提供以下信息:什么平台,什么版本的Python,什么版本的zlib库,什么是用于调用zlib模块的实际代码。还提供来自失败的解压缩尝试的完整跟踪和错误消息。你试图用任何其他zlib读取软件解压缩失败的文件?有什么结果?请澄清你必须处理什么:我爱你吗?意味着您无法访问原始数据?它是如何从一个流到文件?



UPDATE 根据您自我回答中发布的部分澄清提供的一些观察结果:



您正在使用Windows。 Windows在读取和写入文件时区分二进制模式和文本模式。当以文本模式阅读时,Python 2.x将'\r\\\
'更改为'\\\
',并在写入时将'\\\
'更改为'\r\\\
'。这在处理非文本数据时不是一个好主意。更糟糕的是,当以文本模式阅读时,'\x1a'aka Ctrl-Z被视为文件结束。



要压缩文件:

 #import和其他上层结构作为练习
str_object1 = open('my_log_file','rb')。read b $ b str_object2 = zlib.compress(str_object1,9)
f = open('compressed_file','wb')
f.write(str_object2)
f.close b

要解压缩文件:

 code> str_object1 = open('compressed_file','rb')。read()
str_object2 = zlib.decompress(str_object1)
f = open('my_recovered_log_file','wb')
f.write(str_object2)
f.close()

使用gzip模块,可以节省你不必考虑像文本模式这样的nasssties,而额外的头信息只需几个字节。



如果你一直使用' rb'和'wb'在你的压缩代码,但不是在你的解压缩代码[不可能?],你不是被锄,你只需要清理上面的解压缩代码,并去它。



请注意在以下未经测试的提示中使用may,should等。



在你的压缩代码中没有使用'rb'和'wb',你自己的概率是相当高的。



如果有' \\ x1a',在第一个之后的任何数据丢失 - 但在这种情况下,它不应该失败解压缩(IOW这种情况不匹配你的症状)。



如果一个Ctrl-Z由zlib本身生成,这应该导致尝试解压缩时的早期EOF,这当然会导致异常。在这种情况下,您可以通过以二进制模式读取压缩文件,然后用'\\\
'替换'\r\\\
',从而反转过程。模拟没有Ctrl-Z - > EOF gimmick的文本模式]。解压缩结果。 编辑以文本模式写出结果。 结束编辑



UPDATE 2 我可以使用任何级别1到9以下脚本:

  import zlib,sys 
fn = sys.argv [1]
level = int (sys.argv [2])
s1 = open(fn).read()#TEXT mode
s2 = zlib.compress(s1,level)
f = open(fn + ct','w')#TEXT模式
f.write(s2)
f.close()
#尝试在文本模式下解压缩
s1 = open(fn + '-ct')。read()#TEXT mode
s2 = zlib.decompress(s1)#error -5
f = open(fn +'-dtt','w')
f.write(s2)
f.close()

注意:使用一个相当大的文本文件(我使用一个80kb的源文件),以确保解压缩结果将包含一个'\x1a'。



  import zlib,sys 
fn = sys.argv [1]
#文本模式写
#不能使用文本模式读取,因为它将在Ctrl-Z处停止
s1 = open(fn,'rb')。read()#BINARY mode
s1 = s1.replace('\r\\\
','\\\
')
#(2)反向压缩
s2 = zlib.decompress(s1)
# (3)反向文本模式读取
f = open(fn +'-fixed','w')#TEXT模式
f.write(s2)
f.close $ b

注意:如果原始文件中有一个\x1a ,并且以文本模式读取文件时,该字节和所有后续字节将不会包括在压缩文件中,因此不能恢复。对于文本文件(例如源代码),这是没有损失。



更新3 [后面的启示,有一个加密/解密层涉及问题]:



错误-5消息表示您尝试解压缩的数据自压缩以来已被压缩。如果不是由文件模式使用文本模式,怀疑显然(?)落在你的解密和加密包装。如果你想要帮助,你需要泄露那些封装的源。事实上,你应该尝试做的是(像我做的)把一个小脚本,在多个输入文件中再现问题。第二(像我一样)看看你是否可以在什么条件下扭转过程。如果你想帮助第二阶段,你需要泄露问题 - 再现脚本。


Okay so I have some data streams compressed by python's (2.6) zlib.compress() function. When I try to decompress them, some of them won't decompress (zlib error -5, which seems to be a "buffer error", no idea what to make of that). At first, I thought I was done, but I realized that all the ones I couldn't decompress started with 0x78DA (the working ones were 0x789C), and I looked around and it seems to be a different kind of zlib compression -- the magic number changes depending on the compression used. What can I use to decompress the files? Am I hosed?

解决方案

According to RFC 1950 , the difference between the "OK" 0x789C and the "bad" 0x78DA is in the FLEVEL bit-field:

  FLEVEL (Compression level)
     These flags are available for use by specific compression
     methods.  The "deflate" method (CM = 8) sets these flags as
     follows:

        0 - compressor used fastest algorithm
        1 - compressor used fast algorithm
        2 - compressor used default algorithm
        3 - compressor used maximum compression, slowest algorithm

     The information in FLEVEL is not needed for decompression; it
     is there to indicate if recompression might be worthwhile.

"OK" uses 2, "bad" uses 3. So that difference in itself is not a problem.

To get any further, you might consider supplying the following information for each of compressing and (attempted) decompressing: what platform, what version of Python, what version of the zlib library, what was the actual code used to call the zlib module. Also supply the full traceback and error message from the failing decompression attempts. Have you tried to decompress the failing files with any other zlib-reading software? With what results? Please clarify what you have to work with: Does "Am I hosed?" mean that you don't have access to the original data? How did it get from a stream to a file? What guarantee do you have that the data was not mangled in transmission?

UPDATE Some observations based on partial clarifications published in your self-answer:

You are using Windows. Windows distinguishes between binary mode and text mode when reading and writing files. When reading in text mode, Python 2.x changes '\r\n' to '\n', and changes '\n' to '\r\n' when writing. This is not a good idea when dealing with non-text data. Worse, when reading in text mode, '\x1a' aka Ctrl-Z is treated as end-of-file.

To compress a file:

# imports and other superstructure left as a exercise
str_object1 = open('my_log_file', 'rb').read()
str_object2 = zlib.compress(str_object1, 9)
f = open('compressed_file', 'wb')
f.write(str_object2)
f.close()

To decompress a file:

str_object1 = open('compressed_file', 'rb').read()
str_object2 = zlib.decompress(str_object1)
f = open('my_recovered_log_file', 'wb')
f.write(str_object2)
f.close()

Aside: Better to use the gzip module which saves you having to think about nasssties like text mode, at the cost of a few bytes for the extra header info.

If you have been using 'rb' and 'wb' in your compression code but not in your decompression code [unlikely?], you are not hosed, you just need to flesh out the above decompression code and go for it.

Note carefully the use of "may", "should", etc in the following untested ideas.

If you have not been using 'rb' and 'wb' in your compression code, the probability that you have hosed yourself is rather high.

If there were any instances of '\x1a' in your original file, any data after the first such is lost -- but in that case it shouldn't fail on decompression (IOW this scenario doesn't match your symptoms).

If a Ctrl-Z was generated by zlib itself, this should cause an early EOF upon attempted decompression, which should of course cause an exception. In this case you may be able to gingerly reverse the process by reading the compressed file in binary mode and then substitute '\r\n' with '\n' [i.e. simulate text mode without the Ctrl-Z -> EOF gimmick]. Decompress the result. Edit Write the result out in TEXT mode. End edit

UPDATE 2 I can reproduce your symptoms -- with ANY level 1 to 9 -- with the following script:

import zlib, sys
fn = sys.argv[1]
level = int(sys.argv[2])
s1 = open(fn).read() # TEXT mode
s2 = zlib.compress(s1, level)
f = open(fn + '-ct', 'w') # TEXT mode
f.write(s2)
f.close()
# try to decompress in text mode
s1 = open(fn + '-ct').read() # TEXT mode
s2 = zlib.decompress(s1) # error -5
f = open(fn + '-dtt', 'w')
f.write(s2)
f.close()

Note: you will need a use a reasonably large text file (I used an 80kb source file) to ensure that the decompression result will contain a '\x1a'.

I can recover with this script:

import zlib, sys
fn = sys.argv[1]
# (1) reverse the text-mode write
# can't use text-mode read as it will stop at Ctrl-Z
s1 = open(fn, 'rb').read() # BINARY mode
s1 = s1.replace('\r\n', '\n')
# (2) reverse the compression
s2 = zlib.decompress(s1)
# (3) reverse the text mode read
f = open(fn + '-fixed', 'w') # TEXT mode
f.write(s2)
f.close()

NOTE: If there is a '\x1a' aka Ctrl-Z byte in the original file, and the file is read in text mode, that byte and all following bytes will NOT be included in the compressed file, and thus can NOT be recovered. For a text file (e.g. source code), this is no loss at all. For a binary file, you are most likely hosed.

Update 3 [following late revelation that there's an encryption/decryption layer involved in the problem]:

The "Error -5" message indicates that the data that you are trying to decompress has been mangled since it was compressed. If it's not caused by using text mode on the files, suspicion obviously(?) falls on your decryption and encryption wrappers. If you want help, you need to divulge the source of those wrappers. In fact what you should try to do is (like I did) put together a small script that reproduces the problem on more than one input file. Secondly (like I did) see whether you can reverse the process under what conditions. If you want help with the second stage, you need to divulge the problem-reproduction script.

这篇关于zlib在python中解压缩的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆