Deflate压缩块的结构 [英] The structure of Deflate compressed block

查看:269
本文介绍了Deflate压缩块的结构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在理解Deflate算法时遇到麻烦( RFC 1951 ).

I have troubles with understanding Deflate algorithm (RFC 1951).

TL; DR 如何解析Deflate压缩块4be4 0200?

TL; DR How to parse Deflate compressed block 4be4 0200?

我创建了一个包含字母和换行符a\n的文件,然后运行gzip a.txt.结果文件a.txt.gz:

I created a file with a letter and newline a\n in it, and run gzip a.txt. Resultant file a.txt.gz:

1f8b 0808 fe8b eb55 0003 612e 7478 7400

4be4 0200

07a1 eadd 0200 0000

我知道第一行是包含其他信息的标头,最后一行是CRC32加上输入的大小( RFC 1951 ).这两个对我来说没有任何麻烦.

I understand that first line is header with additional information, and last line is CRC32 plus size of input (RFC 1951). These two gives no trouble to me.

但是我如何解释压缩块本身(中线)?

But how do I interpret the compressed block itself (the middle line)?

以下是它的十六进制和二进制表示形式:

Here's hexadecimal and binary representation of it:

4be4 0200

0100 1011
1110 0100
0000 0010
0000 0000

据我所知,这些是不知何故的

As far as I understood, somehow these ones:

每个压缩数据块都以3个标头位开头,这些标头位包含以下数据:

Each block of compressed data begins with 3 header bits containing the following data:

  • 第一位BFINAL
  • 接下来的2位BTYPE

...实际上以第一个字节的结尾结尾:0100 1 011 . (我将跳过一个问题,为什么有人会称呼标头"实际上在其他尾部的东西呢?)

...actually ended up at the end of first byte: 0100 1011. (I'll skip the question why would anyone call "header" something which is actually at the tail of something else.)

据我所知,RFC包含的内容应对此进行解释:

RFC contains something that as far as I understand is supposed to be an explanation to this:

  • 数据元素按以下顺序打包成字节 字节内递增的位数,即开始 字节的最低有效位.
  • 包装了霍夫曼码以外的数据元素 从数据的最低有效位开始 元素.
  • 霍夫曼代码从大多数- 代码的重要部分.
  • Data elements are packed into bytes in order of increasing bit number within the byte, i.e., starting with the least-significant bit of the byte.
  • Data elements other than Huffman codes are packed starting with the least-significant bit of the data element.
  • Huffman codes are packed starting with the most- significant bit of the code.

换句话说,如果要以以下方式打印压缩数据: 字节序列,从第一个字节开始 right 边距,然后进行到 left ,其中最多- 像往常一样,左侧每个字节的有效位是 能够以固定宽度从右到左解析结果 正确的MSB到LSB顺序中的元素以及Huffman代码 位反转顺序(即,代码中的第一位 相对LSB位置.

In other words, if one were to print out the compressed data as a sequence of bytes, starting with the first byte at the right margin and proceeding to the left, with the most- significant bit of each byte on the left as usual, one would be able to parse the result from right to left, with fixed-width elements in the correct MSB-to-LSB order and Huffman codes in bit-reversed order (i.e., with the first bit of the code in the relative LSB position).

但是可悲的是我不明白那个解释.

But sadly I don't understand that explanation.

返回我的数据. OK,所以设置了BFINAL,而BTYPE是什么? 10还是01?

Returning to my data. OK, so BFINAL is set, and BTYPE is what? 10 or 01?

如何解释该压缩块中的其余数据?

How do I interpret the rest of the data in that compressed block?

推荐答案

首先,我们将压缩数据的十六进制表示形式看成是一系列字节(而不是您所询问的一系列16位big-endian值) ):

First lets look at the hexadecimal representation of the compressed data as a series of bytes (instead of a series of 16-bit big-endian values as in your question):

4b e4 02 00

现在让我们将那些十六进制字节转换为二进制:

Now lets convert those hexadecimal bytes to binary:

01001011 11100100 00000010 000000000

根据RFC,这些位从字节的最低有效位开始"打包.字节的最低有效位是该字节的最左位.所以第一个字节的第一位是这个:

According to the RFC, the bits are packed "starting with the least-significant bit of the byte". The least-significant bit of a byte is the left-most bit of the byte. So first bit of the first byte is this one:

01001011 11100100 00000010 000000000
       ^
       first bit

第二个是这个:

01001011 11100100 00000010 000000000
      ^
      second bit

第三位:

01001011 11100100 00000010 000000000
     ^
     third bit

以此类推.浏览完第一个字节中的所有位后,便从第二个字节的最低有效位开始.所以第九位是这个:

And so on. Once you gone through all the bits in the first byte, you then start on the least-significant bit of the second byte. So the ninth bit is this one:

01001011 11100100 00000010 000000000
                ^
                ninth bit

最后一位,即三十秒位,就是这个:

And finally the last-bit, the thirty-second bit, is this one:

01001011 11100100 00000010 000000000
                           ^
                           last bit

BFINAL值是压缩数据中的第一位,因此包含在上面标记为第一位"的单个位中.值为1,表示这是压缩数据中的最后一个块.

The BFINAL value is the first bit in the compressed data, and so is contained in the single bit marked "first bit" above. It's value is 1, which indicates that this is last block in compressed data.

BTYPE值存储在数据的后两位中.这些是上面标记为第二位"和第三位"的位.唯一的问题是两者中的哪一位是最低有效位,哪一位是最高有效位.根据RFC,除了霍夫曼代码外,其他数据元素均已打包 从数据元素的最低有效位开始."这意味着这两个位中的第一个(标记为第二位")是最低有效位.这意味着BTYPE的值是二进制的01.因此表示该块是使用固定的霍夫曼代码压缩的.

The BTYPE value is stored in the next two bits in data. These are the bits marked "second bit" and "third bit" above. The only question is which bit of the two is the least-significant bit and which is the most-significant bit. According to the RFC, "Data elements other than Huffman codes are packed starting with the least-significant bit of the data element." That means the first of these two bits, the one marked "second bit", is the least-significant bit. This means the value of BTYPE is 01 in binary. and so indicates that the block is compressed using fixed Huffman codes.

这很容易完成.解码压缩块的其余部分更加困难(对于更实际的示例,则要困难得多).正确解释该网站的答案将使该答案的时间太长(并且您的问题过于笼统).不过,我会给您一个提示,数据中的下三个元素是霍夫曼代码10010001('a'),00111010('\ n')和0000000(流的末尾).其余6位未使用,并且不属于压缩数据.

And that's the easy part done. Decoding the rest of the compressed block is more difficult (and with a more realistic example, much more difficult). Properly explaining how do that would be make this answer far too long (and your question too broad) for this site. I'll given you a hint though, the next three elements in the data are the Huffman codes 10010001 ('a'), 00111010 ('\n') and 0000000 (end of stream). The remaining 6 bits are unused, and aren't part of the compressed data.

要了解如何解码压缩数据,请务必了解霍夫曼代码是.您遵循的RFC假设您已这样做.您也应该知道 LZ77压缩的工作原理,尽管该文档或多或少地说明了您的需求要知道.

Note to understand how to decode deflate compressed data you're going to have to understand what Huffman codes are. The RFC you're following assumes that you do. You should also know how LZ77 compression works, though the document more or less explains what you need to know.

这篇关于Deflate压缩块的结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆