Python:膨胀和放大实现 [英] Python: Inflate and Deflate implementations

查看:538
本文介绍了Python:膨胀和放大实现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我与一个服务器连接,要求发送到它的数据是用 Deflate 算法(Huffman encoding + LZ77)压缩的,还会发送我需要的数据

I am interfacing with a server that requires that data sent to it is compressed with Deflate algorithm (Huffman encoding + LZ77) and also sends data that I need to Inflate.

我知道Python包含Zlib,并且Zlib中的C库支持调用 deflate 但这些显然不是由Python Zlib模块提供的。它提供压缩解压缩,但当我进行如下的电话:

I know that Python includes Zlib, and that the C libraries in Zlib support calls to Inflate and Deflate, but these apparently are not provided by the Python Zlib module. It does provide Compress and Decompress, but when I make a call such as the following:

result_data = zlib.decompress( base64_decoded_compressed_string )

我收到以下错误: / p>

I receive the following error:

Error -3 while decompressing data: incorrect header check

Gzip没有更好;当调用如下时:

Gzip does no better; when making a call such as:

result_data = gzip.GzipFile( fileobj = StringIO.StringIO( base64_decoded_compressed_string ) ).read()

我收到错误:

IOError: Not a gzipped file

现在我知道有一个 Deflate / em>实现可用(Pyflate),但我不知道实施

Now I know that there is a Deflate implementation available (Pyflate), but I do not know of an Inflate implementation.

看起来有几个选项:

1。 查找Python中膨胀放气的现有实施(理想情况)
holar 2。将我自己的Python扩展程序写入zlib c库,包括 放气
google.com/ 3。调用可以从命令行执行的其他操作(例如Ruby脚本,因为在zlib中的 /
调用调用完全包含在Ruby中)

4。 ?

It seems that there are a few options:
1. Find an existing implementation (ideal) of Inflate and Deflate in Python
2. Write my own Python extension to the zlib c library that includes Inflate and Deflate
3. Call something else that can be executed from the command line (such as a Ruby script, since Inflate/Deflate calls in zlib are fully wrapped in Ruby)
4. ?

我正在寻找一个解决方案,但缺乏一个解决方案我将感谢洞察,建设性的意见和想法。

I am seeking a solution, but lacking a solution I will be thankful for insights, constructive opinions, and ideas.

附加信息
为了我需要的目的,放弃(和编码)字符串的结果应该给出与以下C#代码片段相同的结果,其中输入参数是对应于要压缩的数据的UTF字节数组:

Additional information: The result of deflating (and encoding) a string should, for the purposes I need, give the same result as the following snippet of C# code, where the input parameter is an array of UTF bytes corresponding to the data to compress:

public static string DeflateAndEncodeBase64(byte[] data)
{
    if (null == data || data.Length < 1) return null;
    string compressedBase64 = "";

    //write into a new memory stream wrapped by a deflate stream
    using (MemoryStream ms = new MemoryStream())
    {
        using (DeflateStream deflateStream = new DeflateStream(ms, CompressionMode.Compress, true))
        {
            //write byte buffer into memorystream
            deflateStream.Write(data, 0, data.Length);
            deflateStream.Close();

            //rewind memory stream and write to base 64 string
            byte[] compressedBytes = new byte[ms.Length];
            ms.Seek(0, SeekOrigin.Begin);
            ms.Read(compressedBytes, 0, (int)ms.Length);
            compressedBase64 = Convert.ToBase64String(compressedBytes);
        }
    }
    return compressedBase64;
}

运行此代码为字符串deflate and encode me结果7b0HYBxJliUmL23Ke39K9UrX4HShCIBgEyTYkEAQ7MGIzeaS7B1pRyMpqyqBymVWZV1mFkDM7Z28995777333nvvvfe6O51OJ / FF / z9cZmQBbPbOStrJniGAqsgfP358Hz8iZvl5mbV5mi1nab6cVrM8XeT / DW ==

Running this .NET code for the string "deflate and encode me" gives the result "7b0HYBxJliUmL23Ke39K9UrX4HShCIBgEyTYkEAQ7MGIzeaS7B1pRyMpqyqBymVWZV1mFkDM7Z28995777333nvvvfe6O51OJ/ff/z9cZmQBbPbOStrJniGAqsgfP358Hz8iZvl5mbV5mi1nab6cVrM8XeT/Dw=="

在放气和编码我是通过Python Zlib.compress(运行),然后base64编码时,结果是eJxLSU3LSSxJVUjMS1FIzUvOT0lVyE0FAFXHB6k =。

When "deflate and encode me" is run through the Python Zlib.compress() and then base64 encoded, the result is "eJxLSU3LSSxJVUjMS1FIzUvOT0lVyE0FAFXHB6k=".

很明显,zlib.compress()不是与标准Deflate算法相同的算法的实现。

It is clear that zlib.compress() is not an implementation of the same algorithm as the standard Deflate algorithm.

详细信息

在b64解码后,.NET缩减数据的前2个字节(7b0HY ...是0xEDBD,它不对应于Gzip数据(0x1f8b),BZip2(0x425A)数据或Zlib(0x789C)数据。

The first 2 bytes of the .NET deflate data ("7b0HY..."), after b64 decoding are 0xEDBD, which does not correspond to Gzip data (0x1f8b), BZip2 (0x425A) data, or Zlib (0x789C) data.

Python压缩的前2个字节数据(eJxLS ...),b64解码后为0x789C。这是一个Zlib头。

The first 2 bytes of the Python compressed data ("eJxLS..."), after b64 decoding are 0x789C. This is a Zlib header.

已解决

为了处理原始泄漏和膨胀,没有头和校验和,需要发生以下事情:

SOLVED
To handle the raw deflate and inflate, without header and checksum, the following things needed to happen:

在deflate / compress:剥离前两个字节(标题)和最后四个字节(校验和)。 $ b $bÚ
在膨胀/解压缩时:窗口大小有第二个参数。如果此值为负,则会抑制标头。


这里是我目前的方法,包括base64编码/解码 - 并且正常工作:

On deflate/compress: strip the first two bytes (header) and the last four bytes (checksum).
On inflate/decompress: there is a second argument for window size. If this value is negative it suppresses headers.
here are my methods currently, including the base64 encoding/decoding - and working properly:

import zlib
import base64

def decode_base64_and_inflate( b64string ):
    decoded_data = base64.b64decode( b64string )
    return zlib.decompress( decoded_data , -15)

def deflate_and_base64_encode( string_val ):
    zlibbed_str = zlib.compress( string_val )
    compressed_string = zlibbed_str[2:-4]
    return base64.b64encode( compressed_string )


推荐答案

这是MizardX答案的附件,给出一些解释和背景。

This is an add-on to MizardX's answer, giving some explanation and background.

请参阅 http://www.chiramattel.com/george/blog/2007/09/09/deflatestream-block-length-does-not-match。 html

根据 RFC 1950 ,以默认方式构造的zlib流由以下组成:

According to RFC 1950, a zlib stream constructed in the default manner is composed of:


  • 2字节报头0x78 0x9C)

  • 一个缩小流 - 请参阅 RFC 1951

  • 未压缩数据的Adler-32校验和(4字节)

C# DeflateStream 适用于(你猜到)一个放气流。

The C# DeflateStream works on (you guessed it) a deflate stream. MizardX's code is telling the zlib module that the data is a raw deflate stream.

观察:(1)希望C#的deflation方法产生一个更长的字符串只发生在与短输入(2)使用原始泄气流没有Adler-32校验和?

Observations: (1) One hopes the C# "deflation" method producing a longer string happens only with short input (2) Using the raw deflate stream without the Adler-32 checksum? Bit risky, unless replaced with something better.

更新

错误消息块长度与其补码不匹配

error message Block length does not match with its complement

如果您试图膨胀一些压缩数据与C# DeflateStream ,你得到那个消息,那么很可能你给它一个zlib流,而不是一个deflate流。

If you are trying to inflate some compressed data with the C# DeflateStream and you get that message, then it is quite possible that you are giving it a a zlib stream, not a deflate stream.

请参阅如何在文件的一部分上使用DeflateStream? a>

See How do you use a DeflateStream on part of a file?

同时将错误消息复制/粘贴到Google搜索中,您将获得大量的匹配(包括此答案前的一个)

Also copy/paste the error message into a Google search and you will get numerous hits (including the one up the front of this answer) saying much the same thing.

Java Deflater 是相当简单的,并已针对Java实现进行测试。

The Java Deflater ... used by "the website" ... C# DeflateStream "is pretty straightforward and has been tested against the Java implementation". Which of the following possible Java Deflater constructors is the website using?


public Deflater(int level,boolean nowrap)下列哪一个可能的Java Deflater构造函数是网站使用?

使用指定的压缩级别创建新压缩器。如果'nowrap'为true,则不会使用ZLIB头和校验和字段,以支持在GZIP和PKZIP中使用的压缩格式。

Creates a new compressor using the specified compression level. If 'nowrap' is true then the ZLIB header and checksum fields will not be used in order to support the compression format used in both GZIP and PKZIP.

public Deflater(int level)

使用指定的压缩级别创建一个新压缩器。压缩数据将以ZLIB格式生成。

Creates a new compressor using the specified compression level. Compressed data will be generated in ZLIB format.

public Deflater()

创建具有默认压缩级别的新压缩器。压缩数据将以ZLIB格式生成。

Creates a new compressor with the default compression level. Compressed data will be generated in ZLIB format.

一行deflater 在丢弃2字节zlib标头和4字节校验和:

A one-line deflater after throwing away the 2-byte zlib header and the 4-byte checksum:

uncompressed_string.encode('zlib')[2:-4] # does not work in Python 3.x


zlib.compress(uncompressed_string)[2:-4]

这篇关于Python:膨胀和放大实现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆