Bzip2区块标头:1AY& SY [英] Bzip2 block header: 1AY&SY

查看:118
本文介绍了Bzip2区块标头:1AY& SY的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是有关bzip2 存档格式的问题。任何Bzip2归档文件都由文件头,一个或多个块和尾部结构组成。所有块均应以Pi编号为0x314159265359的6个字节的BCD编码数字 1AY& SY开头。根据 bzip2的来源

This is the question about bzip2 archive format. Any Bzip2 archive consists of file header, one or more blocks and tail structure. All blocks should start with "1AY&SY", 6 bytes of BCD-encoded digits of the Pi number, 0x314159265359. According to the source of bzip2:

/*--
  A 6-byte block header, the value chosen arbitrarily
  as 0x314159265359 :-).  A 32 bit value does not really
  give a strong enough guarantee that the value will not
  appear by chance in the compressed datastream.  Worst-case
  probability of this event, for a 900k block, is about
  2.0e-3 for 32 bits, 1.0e-5 for 40 bits and 4.0e-8 for 48 bits.
  For a compressed file of size 100Gb -- about 100000 blocks --
  only a 48-bit marker will do.  NB: normal compression/
  decompression do *not* rely on these statistical properties.
  They are only important when trying to recover blocks from
  damaged files.
--*/

问题是:确实如此,所有bzip2存档都会有与字节边界开始对齐的块?我的意思是所有由bzip2的引用实现(bzip2-1.0.5 +实用程序)创建的所有档案。

The question is: Is it true, that all bzip2 archives will have blocks with start aligned to byte boundary? I mean all archives created by reference implementation of bzip2, the bzip2-1.0.5+ utility.

我认为bzip2可能不是将字节流解析为字节流,而是将位解析为位流(该块本身由霍夫曼编码,在设计上未按字节对齐)。

I think that bzip2 may parse the stream not as byte stream but as bit stream (the block itself is encoded by huffman, which is not byte-aligned by design).

因此,换句话说:如果 grep -c 1AY& SY 更大(霍夫曼内部可能会产生1AY& SY

So, in other words: If grep -c 1AY&SY greater (huffman may generate 1AY&SY inside block) or equal to count of bzip2 blocks in the file?

推荐答案

BZIP2会查看比特流。

BZIP2 looks at a bit stream.

来自 http:// blastedbio.blogspot.com/2011/11/random-access-to-bzip2.html


无论如何,重要位是一个BZIP2文件包含一个或多个按字节对齐的
流,每个流包含一个(零?)或多个不按字节对齐的
块,后跟结尾流
标记(六个字节0x177245385090,它是pi的平方根,是
a二进制编码的十进制(BCD),一个四字节的校验和和用于
字节对齐的空位)。

Anyway, the important bits are that a BZIP2 file contains one or more "streams", which are byte aligned, each containing one (zero?) or more "blocks", which are not byte aligned, followed by an end of stream marker (the six bytes 0x177245385090 which is the square root of pi as a binary coded decimal (BCD), a four byte checksum, and empty bits for byte alignment).

bzip2 维基百科艺术条款还暗示了位块对齐(请参阅文件格式部分),这似乎与我在学校时记得的内联(必须实现该算法...)。

The bzip2 wikipedia article also alludes to bit-block alignment (see the File Format section), which seems to be inline from what I remember from school (had to implement the algorithm...).

这篇关于Bzip2区块标头:1AY& SY的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆