Bzip2 区块头:1AY&SY [英] Bzip2 block header: 1AY&SY

查看:31
本文介绍了Bzip2 区块头:1AY&SY的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是关于 bzip2 归档格式的问题.任何 Bzip2 存档都由文件头、一个或多个块和尾结构组成.所有块都应以1AY&SY"开头,Pi 编号的 6 字节 BCD 编码数字,0x314159265359.根据 bzip2的来源:

This is the question about bzip2 archive format. Any Bzip2 archive consists of file header, one or more blocks and tail structure. All blocks should start with "1AY&SY", 6 bytes of BCD-encoded digits of the Pi number, 0x314159265359. According to the source of bzip2:

/*--
  A 6-byte block header, the value chosen arbitrarily
  as 0x314159265359 :-).  A 32 bit value does not really
  give a strong enough guarantee that the value will not
  appear by chance in the compressed datastream.  Worst-case
  probability of this event, for a 900k block, is about
  2.0e-3 for 32 bits, 1.0e-5 for 40 bits and 4.0e-8 for 48 bits.
  For a compressed file of size 100Gb -- about 100000 blocks --
  only a 48-bit marker will do.  NB: normal compression/
  decompression do *not* rely on these statistical properties.
  They are only important when trying to recover blocks from
  damaged files.
--*/

问题是:是不是所有的 bzip2 存档都会有块的开头与字节边界对齐?我的意思是所有由 bzip2 的参考实现创建的档案,bzip2-1.0.5+ 实用程序.

The question is: Is it true, that all bzip2 archives will have blocks with start aligned to byte boundary? I mean all archives created by reference implementation of bzip2, the bzip2-1.0.5+ utility.

我认为 bzip2 可能不会将流解析为字节流,而是将其解析为位流(块本身是由 huffman 编码的,它不是按设计进行字节对齐的).

I think that bzip2 may parse the stream not as byte stream but as bit stream (the block itself is encoded by huffman, which is not byte-aligned by design).

那么,换句话说:如果 grep -c 1AY&SY 更大(霍夫曼可能会在块内生成 1AY&SY)或等于文件中 bzip2 块的数量?

So, in other words: If grep -c 1AY&SY greater (huffman may generate 1AY&SY inside block) or equal to count of bzip2 blocks in the file?

推荐答案

BZIP2 查看比特流.

BZIP2 looks at a bit stream.

来自 http://blastedbio.blogspot.com/2011/11/random-access-to-bzip2.html:

无论如何,重要的是 BZIP2 文件包含一个或多个流",字节对齐,每个包含一个(零?)或多个块",不是字节对齐的,后面是流的结尾标记(作为 pi 的平方根的六个字节 0x177245385090二进制编码的十进制 (BCD)、四字节校验和和空位字节对齐).

Anyway, the important bits are that a BZIP2 file contains one or more "streams", which are byte aligned, each containing one (zero?) or more "blocks", which are not byte aligned, followed by an end of stream marker (the six bytes 0x177245385090 which is the square root of pi as a binary coded decimal (BCD), a four byte checksum, and empty bits for byte alignment).

bzip2 维基百科 文章也提到了位块对齐(参见文件格式部分),这似乎与我在学校记忆中的内联(必须实现算法......).

The bzip2 wikipedia article also alludes to bit-block alignment (see the File Format section), which seems to be inline from what I remember from school (had to implement the algorithm...).

这篇关于Bzip2 区块头:1AY&SY的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆