这是一个bug在这个gzip inflate方法? [英] Is this a bug in this gzip inflate method?

查看:786
本文介绍了这是一个bug在这个gzip inflate方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当搜索如何在iOS上膨胀gzip压缩数据时,以下方法出现在多个结果中:

   - NSData *)gzipInflate 
{
if([self length] == 0)return self;

unsigned full_length = [self length];
unsigned half_length = [self length] / 2;

NSMutableData * decompressed = [NSMutableData dataWithLength:full_length + half_length];
BOOL done = NO;
int status;

z_stream strm;
strm.next_in =(Bytef *)[self bytes];
strm.avail_in = [self length];
strm.total_out = 0;
strm.zalloc = Z_NULL;
strm.zfree = Z_NULL;

if(inflateInit2(& strm,(15 + 32))!= Z_OK)return nil;
while(!done)
{
//确保我们有足够的空间并重置长度。
if(strm.total_out> = [decompressed length])
[decompressed increaseLengthBy:half_length];
strm.next_out = [decompressed mutableBytes] + strm.total_out;
strm.avail_out = [解压长度] - strm.total_out;

//膨胀另一个块。
status = inflate(& strm,Z_SYNC_FLUSH);
if(status == Z_STREAM_END)done = YES;
else if(status!= Z_OK)break;
}
if(inflateEnd(& strm)!= Z_OK)return nil;

//设置实际长度。
if(done)
{
[decompressed setLength:strm.total_out];
return [NSData dataWithData:decompressed];
}
else return nil;
}

但我遇到了一些数据的例子使用Python的 gzip模块)在iOS上运行的此方法失败充气。下面是发生了什么:



在while循环的最后一次迭代中,inflate()返回Z_BUF_ERROR并退出循环。但是在循环后调用的inflateEnd()返回Z_OK。代码然后假定因为inflate()从未返回Z_STREAM_END,通货膨胀失败并返回null。



根据此页, http://www.zlib.net/zlib_faq.html#faq05 Z_BUF_ERROR不是致命的错误,我的测试用有限的例子显示如果inflateEnd()返回Z_OK,即使最后一次调用inflate()没有返回Z_OK,数据被成功扩充。看起来像inflateEnd()完成了最后一块数据。



我不知道压缩和gzip的工作原理,所以我犹豫对此代码进行更改,但不完全理解它的作用。我希望有更多关于这个主题的知识的人能够了解上述代码中潜在的逻辑缺陷,并建议一种解决方法。



另一种方法Google发现,似乎遭受相同的问题,可以在这里找到: https://github.com/nicklockwood/GZIP/blob/master/GZIP/NSData%2BGZIP.m



编辑:



所以,这是一个错误!现在,我们如何解决它?下面是我的尝试。

   - (NSData *)gzipInflate 
{
if([self length] == 0)return self;

unsigned full_length = [self length];
unsigned half_length = [self length] / 2;

NSMutableData * decompressed = [NSMutableData dataWithLength:full_length + half_length];
int status;

z_stream strm;
strm.next_in =(Bytef *)[self bytes];
strm.avail_in = [self length];
strm.total_out = 0;
strm.zalloc = Z_NULL;
strm.zfree = Z_NULL;

if(inflateInit2(& strm,(15 + 32))!= Z_OK)return nil;

do
{
//确保我们有足够的空间并重置长度。
if(strm.total_out> = [decompressed length])
[decompressed increaseLengthBy:half_length];
strm.next_out = [decompressed mutableBytes] + strm.total_out;
strm.avail_out = [解压长度] - strm.total_out;

//膨胀另一个块。
status = inflate(& strm,Z_SYNC_FLUSH);

switch(status){
case Z_NEED_DICT:
status = Z_DATA_ERROR; / *和下降通过* /
case Z_DATA_ERROR:
case Z_MEM_ERROR:
case Z_STREAM_ERROR:
(void)inflateEnd(& strm);
return nil;
}
} while(status!= Z_STREAM_END);

(void)inflateEnd(& strm);

//设置实际长度。
if(status == Z_STREAM_END)
{
[decompressed setLength:strm.total_out];
return [NSData dataWithData:decompress];
}
else return nil;
}

编辑2:
$ b

这里是一个示例Xcode项目,说明我正在运行的问题。放气发生在服务器端,数据在通过HTTP传输之前是base64和url编码。
我在ViewController.m中嵌入了url编码的base64字符串。 url-decode和base64-decode以及您的gzipInflate方法在NSDataExtension.m



https://dl.dropboxusercontent.com/u/38893107/gzip/GZIPTEST.zip



以下是由python gzip库放弃的二进制文件:



https://dl.dropboxusercontent.com/u/38893107/gzip/binary.zip



这是通过HTTP传输的URL编码的base64字符串:
https:/ /dl.dropboxusercontent.com/u/38893107/gzip/urlEncodedBase64.txt

解决方案

是的,这是一个错误。



事实上,如果 inflate()不返回 Z_STREAM_END ,那么你还没有完成通货膨胀。 inflateEnd()返回 Z_OK 并不意味着多少 - 只是它被赋予一个有效的状态,释放内存。



因此 inflate()必须最终返回 Z_STREAM_END ,然后才能声明成功。但是 Z_BUF_ERROR 不是放弃的理由。在这种情况下,您只需用更多的输入或更多的输出空间调用 inflate()。然后,您将获得 Z_STREAM_END 。



net / manual.html#Basic> zlib.h

  / * ... 
Z_BUF_ERROR如果不可能进行,或者如果在使用Z_FINISH时
输出缓冲区中没有足够的空间。注意,Z_BUF_ERROR不是致命的,
inflate()可以再次调用更多的输入和更多的输出空间到
继续解压缩。
... * /

更新: >

由于有bug有代码漂浮在那里,下面是实现所需方法的正确代码。此代码处理不完整的gzip流,连接的gzip流和非常大的gzip流。对于非常大的gzip流,当编译为64位可执行文件时, z_stream 中的 unsigned 长度不够大。 NSUInteger 是64位,而 unsigned 是32位。在这种情况下,您必须循环输入以将它馈送到 inflate()



对任何错误返回 nil 。错误的性质在每个 return nil; 后注释,以防需要更复杂的错误处理。

   - (NSData *)gzipInflate 
{
z_stream strm;

//初始化输入
strm.next_in =(Bytef *)[self bytes];
NSUInteger left = [self length]; // input left to decompress
if(left == 0)
return nil; //不完整的gzip流

//为输出创建起始空间(猜测是输入大小的两倍,将增长
//如果需要 - 在极端情况下,可能需要超过1000
//乘以输入大小)
NSUInteger space = left<< 1;
if(space< left)
space = NSUIntegerMax;
NSMutableData * decompressed = [NSMutableData dataWithLength:space];
space = [decompressed length];

//初始化输出
strm.next_out =(Bytef *)[decompressed mutableBytes];
NSUInteger have = 0; // output generated far from

//设置gzip解码
strm.avail_in = 0;
strm.zalloc = Z_NULL;
strm.zfree = Z_NULL;
strm.opaque = Z_NULL;
int status = inflateInit2(& strm,(15 + 16));
if(status!= Z_OK)
return nil; // out of memory

//解压所有的自我
do {
//允许连接的gzip流(根据RFC 1952)
if(status == Z_STREAM_END)
(void)inflateReset(& strm);

//为inflate提供输入
if(strm.avail_in == 0){
strm.avail_in = left> UINT_MAX? UINT_MAX:(unsigned)left;
left - = strm.avail_in;
}

//解压缩可用的输入
do {
//如果没有剩下的话分配更多的输出空间
if(space == have){
//双空格,句柄溢出
空格<< = 1;
if(space< have){
space = NSUIntegerMax;
if(space == have){
//空间已经超出!
(void)inflateEnd(& strm);
return nil; // output exceed integer size
}
}

//增加空间
[decompressed setLength:space];
space = [decompressed length];

//更新输出指针(可能已移动)
strm.next_out =(Bytef *)[decompressed mutableBytes] + have;
}

//为inflate提供输出空间
strm.avail_out = space - have> UINT_MAX? UINT_MAX:
(unsigned)(space - have);
有+ = strm.avail_out;

//膨胀和更新解压缩的大小
status = inflate(& strm,Z_SYNC_FLUSH);
have - = strm.avail_out;

//如果出现任何错误,则省略B
if(status!= Z_OK&& status!= Z_BUF_ERROR& $ amp;
status!= Z_STREAM_END){
(void)inflateEnd(& strm);
return nil; //无效的gzip流
}

//重复,直到所有输出从提供的输入生成(注意
//即使strm.avail_in为零,仍可能有pending
// output - 我们没有完成,直到输出缓冲区没有填充)
} while(strm.avail_out == 0);

//继续,直到消耗所有输入
} while(left || strm.avail_in);

//释放由inflateInit2()分配的内存
(void)inflateEnd(& strm);

//验证输入是否是有效的gzip流
if(status!= Z_STREAM_END)
return nil; //不完整的gzip流

//设置实际长度并返回解压缩的数据
[decompressed setLength:have];
return decompressed;
}


When searching on how to inflate gzip compressed data on iOS, the following method appears in number of results:

- (NSData *)gzipInflate
{
    if ([self length] == 0) return self;

    unsigned full_length = [self length];
    unsigned half_length = [self length] / 2;

    NSMutableData *decompressed = [NSMutableData dataWithLength: full_length + half_length];
    BOOL done = NO;
    int status;

    z_stream strm;
    strm.next_in = (Bytef *)[self bytes];
    strm.avail_in = [self length];
    strm.total_out = 0;
    strm.zalloc = Z_NULL;
    strm.zfree = Z_NULL;

    if (inflateInit2(&strm, (15+32)) != Z_OK) return nil;
    while (!done)
    {
        // Make sure we have enough room and reset the lengths.
        if (strm.total_out >= [decompressed length])
            [decompressed increaseLengthBy: half_length];
        strm.next_out = [decompressed mutableBytes] + strm.total_out;
        strm.avail_out = [decompressed length] - strm.total_out;

        // Inflate another chunk.
        status = inflate (&strm, Z_SYNC_FLUSH);
        if (status == Z_STREAM_END) done = YES;
        else if (status != Z_OK) break;
    }
    if (inflateEnd (&strm) != Z_OK) return nil;

    // Set real length.
    if (done)
    {
        [decompressed setLength: strm.total_out];
        return [NSData dataWithData: decompressed];
    }
    else return nil;
}

But I've come across some examples of data (deflated on a Linux machine with Python's gzip module) that this method running on iOS is failing to inflate. Here's what's happening:

In the last iteration of the while loop inflate() returns Z_BUF_ERROR and the loop is exited. But inflateEnd(), which is called after the loop, returns Z_OK. The code then assumes that since inflate() never returned Z_STREAM_END, the inflation failed and returns null.

According to this page, http://www.zlib.net/zlib_faq.html#faq05 Z_BUF_ERROR is not a fatal error, and my tests with limited examples show that the data is successfully inflated if the inflateEnd() returns Z_OK, even though the last call of inflate() did not return Z_OK. It seems like the inflateEnd() finished up inflating the last chunk of data.

I don't know much about compression and how gzip works, so I'm hesitant to make changes to this code without fully understanding what it does. I'm hoping someone with more knowledge about the topic can shed some light on this potential logic flaw in the code above, and suggest a way to fix it.

Another method that Google turns up, that seems to suffer from the same problem can be found here: https://github.com/nicklockwood/GZIP/blob/master/GZIP/NSData%2BGZIP.m

Edit:

So, it is a bug! Now, how to we fix it? Below is my attempt. Code review, anyone?

- (NSData *)gzipInflate
{
    if ([self length] == 0) return self;

    unsigned full_length = [self length];
    unsigned half_length = [self length] / 2;

    NSMutableData *decompressed = [NSMutableData dataWithLength: full_length + half_length];
    int status;

    z_stream strm;
    strm.next_in = (Bytef *)[self bytes];
    strm.avail_in = [self length];
    strm.total_out = 0;
    strm.zalloc = Z_NULL;
    strm.zfree = Z_NULL;

    if (inflateInit2(&strm, (15+32)) != Z_OK) return nil;

    do
    {
        // Make sure we have enough room and reset the lengths.
        if (strm.total_out >= [decompressed length])
            [decompressed increaseLengthBy: half_length];
        strm.next_out = [decompressed mutableBytes] + strm.total_out;
        strm.avail_out = [decompressed length] - strm.total_out;

        // Inflate another chunk.
        status = inflate (&strm, Z_SYNC_FLUSH);

        switch (status) {
            case Z_NEED_DICT:
                status = Z_DATA_ERROR;     /* and fall through */
            case Z_DATA_ERROR:
            case Z_MEM_ERROR:
            case Z_STREAM_ERROR:
                (void)inflateEnd(&strm);
                return nil;
        }
    } while (status != Z_STREAM_END);

    (void)inflateEnd (&strm);

    // Set real length.
    if (status == Z_STREAM_END)
    {
        [decompressed setLength: strm.total_out];
        return [NSData dataWithData: decompressed];
    }
    else return nil;
}

Edit 2:

Here's a sample Xcode project that illustrates the issue I'm running in. The deflate happens on the server side and the data is base64 and url encoded before being transported via HTTP. I've embedded the url encoded base64 string in the ViewController.m. The url-decode and base64-decode as well as your gzipInflate methods are in NSDataExtension.m

https://dl.dropboxusercontent.com/u/38893107/gzip/GZIPTEST.zip

Here's the binary file as deflated by python gzip library:

https://dl.dropboxusercontent.com/u/38893107/gzip/binary.zip

This is the URL encoded base64 string that gets transported over the HTTP: https://dl.dropboxusercontent.com/u/38893107/gzip/urlEncodedBase64.txt

解决方案

Yes, it's a bug.

It is in fact correct that if inflate() does not return Z_STREAM_END, then you have not completed inflation. inflateEnd() returning Z_OK doesn't really mean much -- just that it was given a valid state and was able to free the memory.

So inflate() must eventually return Z_STREAM_END before you can declare success. However Z_BUF_ERROR is not a reason to give up. In that case you simply call inflate() again with more input or more output space. Then you will get the Z_STREAM_END.

From the documentation in zlib.h:

/* ...
Z_BUF_ERROR if no progress is possible or if there was not enough room in the
output buffer when Z_FINISH is used.  Note that Z_BUF_ERROR is not fatal, and
inflate() can be called again with more input and more output space to
continue decompressing.
... */

Update:

Since there is buggy code floating around out there, below is the proper code to implement the desired method. This code handles incomplete gzip streams, concatenated gzip streams, and very large gzip streams. For very large gzip streams, the unsigned lengths in the z_stream are not large enough when compiled as a 64-bit executable. NSUInteger is 64 bits, whereas unsigned is 32 bits. In that case, you have to loop on the input to feed it to inflate().

This example simply returns nil on any error. The nature of the error is noted in a comment after each return nil;, in case more sophisticated error handling is desired.

- (NSData *) gzipInflate
{
    z_stream strm;

    // Initialize input
    strm.next_in = (Bytef *)[self bytes];
    NSUInteger left = [self length];        // input left to decompress
    if (left == 0)
        return nil;                         // incomplete gzip stream

    // Create starting space for output (guess double the input size, will grow
    // if needed -- in an extreme case, could end up needing more than 1000
    // times the input size)
    NSUInteger space = left << 1;
    if (space < left)
        space = NSUIntegerMax;
    NSMutableData *decompressed = [NSMutableData dataWithLength: space];
    space = [decompressed length];

    // Initialize output
    strm.next_out = (Bytef *)[decompressed mutableBytes];
    NSUInteger have = 0;                    // output generated so far

    // Set up for gzip decoding
    strm.avail_in = 0;
    strm.zalloc = Z_NULL;
    strm.zfree = Z_NULL;
    strm.opaque = Z_NULL;
    int status = inflateInit2(&strm, (15+16));
    if (status != Z_OK)
        return nil;                         // out of memory

    // Decompress all of self
    do {
        // Allow for concatenated gzip streams (per RFC 1952)
        if (status == Z_STREAM_END)
            (void)inflateReset(&strm);

        // Provide input for inflate
        if (strm.avail_in == 0) {
            strm.avail_in = left > UINT_MAX ? UINT_MAX : (unsigned)left;
            left -= strm.avail_in;
        }

        // Decompress the available input
        do {
            // Allocate more output space if none left
            if (space == have) {
                // Double space, handle overflow
                space <<= 1;
                if (space < have) {
                    space = NSUIntegerMax;
                    if (space == have) {
                        // space was already maxed out!
                        (void)inflateEnd(&strm);
                        return nil;         // output exceeds integer size
                    }
                }

                // Increase space
                [decompressed setLength: space];
                space = [decompressed length];

                // Update output pointer (might have moved)
                strm.next_out = (Bytef *)[decompressed mutableBytes] + have;
            }

            // Provide output space for inflate
            strm.avail_out = space - have > UINT_MAX ? UINT_MAX :
                             (unsigned)(space - have);
            have += strm.avail_out;

            // Inflate and update the decompressed size
            status = inflate (&strm, Z_SYNC_FLUSH);
            have -= strm.avail_out;

            // Bail out if any errors
            if (status != Z_OK && status != Z_BUF_ERROR &&
                status != Z_STREAM_END) {
                (void)inflateEnd(&strm);
                return nil;                 // invalid gzip stream
            }

            // Repeat until all output is generated from provided input (note
            // that even if strm.avail_in is zero, there may still be pending
            // output -- we're not done until the output buffer isn't filled)
        } while (strm.avail_out == 0);

        // Continue until all input consumed
    } while (left || strm.avail_in);

    // Free the memory allocated by inflateInit2()
    (void)inflateEnd(&strm);

    // Verify that the input is a valid gzip stream
    if (status != Z_STREAM_END)
        return nil;                         // incomplete gzip stream

    // Set the actual length and return the decompressed data
    [decompressed setLength: have];
    return decompressed;
}

这篇关于这是一个bug在这个gzip inflate方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆