用LZ4解压缩时如何知道输出缓冲区太小? [英] How to know when the output buffer is too small when decompressing with LZ4?

查看:617
本文介绍了用LZ4解压缩时如何知道输出缓冲区太小?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

LZ4_decompress_safe 说:


/*! LZ4_decompress_safe() :
    compressedSize : is the precise full size of the compressed block.
    maxDecompressedSize : is the size of destination buffer, which must be already allocated.
    return : the number of bytes decompressed into destination buffer (necessarily <= maxDecompressedSize)
             If destination buffer is not large enough, decoding will stop and output an error code (<0).
             If the source stream is detected malformed, the function will stop decoding and return a negative result.
             This function is protected against buffer overflow exploits, including malicious data packets.
             It never writes outside output buffer, nor reads outside input buffer.
*/
LZ4LIB_API int LZ4_decompress_safe (const char* source, char* dest, int compressedSize, int maxDecompressedSize);


但未指定如何区分问题所在目标缓冲区太小或输入格式错误/参数组合不正确/...

But doesn't specify how to distinguish whether the issue is with a too small destination buffer or from malformed input/bad combination of parameters/...

在我不知道目标解压缩大小是什么的情况下,我可以知道是否应该使用更大的缓冲区重试吗?

In the case where I don't know what the target decompressed size is, how can I know whether I should retry with a bigger buffer, or not?

推荐答案

有一个问题已打开,目前,还没有公共API可以区分错误。

There is an issue opened about this, and for now there is no public API to distinguish between errors.

作为启发式,查看代码显示了可能的返回值:

As a heuristic, looking at the code shows the possible return values:


    /* end of decoding */
    if (endOnInput)
       return (int) (((char*)op)-dest);     /* Nb of output bytes decoded */
    else
       return (int) (((const char*)ip)-source);   /* Nb of input bytes read */

    /* Overflow error detected */
_output_error:
    return (int) (-(((const char*)ip)-source))-1;


所以只有2种情况:


  • 解码成功,或者您得到肯定的结果(其含义取决于您是处于完全模式还是部分模式)

  • 或者解码不成功,您将得到否定结果

如果结果为否定,则值为-(position_in_input + 1)

In the case of the negative result, the value is -(position_in_input + 1).

这表明猜测目标缓冲区是否太小可以很好地完成通过使用(更大)的缓冲区重试,并检查故障是否在相同位置发生,从而获得成功的可能性:

This suggests that guessing whether the destination buffer was too small can be accomplished with a good likelihood of success by retrying with a (much) bigger buffer, and checking whether the failure occurs in the same position:


  • 如果第二次减压

  • 如果第二次减压尝试在同一位置失败,则可能与输入有关,

  • 否则,您必须再次尝试使用更大的缓冲区。

或者否则,只要结果有所不同,请重试,否则,就会得到您的结果。

Or otherwise said, as long as the result differs, try again, otherwise, there's your result.

限制

输入指针不一定每次都超前一个字节,它可能会超前 length 个字节从输入读取 length 且无界的两个地方。

The input pointer does not necessarily advance one byte at a time, it may advance length bytes in two places where length is read from the input and unbounded.

如果由于输出缓冲区太小而导致解码失败,并且新的输出缓冲区对于 length 来说仍然太小,即使输入没有(不必要)格式错误,解码也会在相同位置失败。

If decoding fails because the output buffer was too small, and the new output buffer is still too small for length, then decoding will fail in the same position even though the input is not (necessarily) malformed.

如果存在误报问题,则可以尝试

If false positives are an issue, then one may attempt to:


  • 解码 length ,通过检查返回位置处的输入流,

  • 只需分配 255 *< input size> -根据马克·阿德勒的答案的2526 ,对于少量输入是合理的。

  • decode the length, by checking the input stream at the position returned,
  • simply allocate 255 * <input size> - 2526 as per Mark Adler's answer, which is reasonable for small inputs.

这篇关于用LZ4解压缩时如何知道输出缓冲区太小?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆