计算/验证BZ2(bzip2的)CRC32在Python [英] Calculate/validate bz2 (bzip2) CRC32 in Python

查看:1202
本文介绍了计算/验证BZ2(bzip2的)CRC32在Python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图计算/验证CRC32校验和COM pressed bzip2的档案。

I'm trying to calculate/validate the CRC32 checksums for compressed bzip2 archives.

.magic:16                       = 'BZ' signature/magic number
.version:8                      = 'h' for Bzip2 ('H'uffman coding)
.hundred_k_blocksize:8          = '1'..'9' block-size 100 kB-900 kB

.compressed_magic:48            = 0x314159265359 (BCD (pi))
.crc:32                         = checksum for this block
...
... 
.eos_magic:48                   = 0x177245385090 (BCD sqrt(pi))
.crc:32                         = checksum for whole stream
.padding:0..7                   = align to whole byte

http://en.wikipedia.org/wiki/Bzip2

所以我知道那里的CRC校验和在BZ2文件,但我将如何去验证它们。我应该什么块 binascii.crc32()来得到这两个社区康复中心?我已经试过各种计算块,逐字节的CRC,但没有设法匹配。

So I know where the CRC checksums are in a bz2 file, but how would I go about validating them. What chunks should I binascii.crc32() to get both CRCs? I've tried calculating the CRC of various chunks, byte-by-byte, but have not managed to get a match.

感谢您。我会寻找到bzip2源代码和 BZ2 Python库code,也许要找到的东西,尤其是在 DECOM preSS ()方法。

Thank you. I'll be looking into the bzip2 sources and bz2 Python library code, to maybe find something, especially in the decompress() method.

更新1:

该块头由下面的标签,据我可以看到标识。的但微小的BZ2文件不包含ENDMARK的。的(感谢 ADW 后,我们发现了一个人应该找位移ENDMARK的值,因为COM pressed数据不填充到字节)。

The block headers are identified by the following tags as far as I can see. But tiny bz2 files do not contain the ENDMARK ones. (Thanks to adw, we've found out that one should look for bit shifted values of the ENDMARK, since the compressed data is not padded to bytes.)

#define BLOCK_HEADER_HI  0x00003141UL
#define BLOCK_HEADER_LO  0x59265359UL

#define BLOCK_ENDMARK_HI 0x00001772UL
#define BLOCK_ENDMARK_LO 0x45385090UL

这是从 bzlib2recover.c 源,阻断似乎在80位始终启动,对CRC校验,这应该从CRC计算被省略,如前一个人不能CRC自己的CRC是相同的CRC(你明白我的意思)。

This is from the bzlib2recover.c source, blocks seem to start always at bit 80, right before the CRC checksum, which should be omitted from the CRC calculation, as one can't CRC its own CRC to be the same CRC (you get my point).

searching for block boundaries ...
block 1 runs from 80 to 1182

展望code,计算这一点。

Looking into the code that calculates this.

更新2:

bzlib2recover.c 不具备CRC计算功能,它只是复制CRC从损坏的文件。不过,我还是设法复制在Python块计算器功能,标示出了 BZ2 COM pressed文件的起始和每个块的结束位。回到正轨,我发现 COM press.c 指一些在 bzlib_private.h 。

bzlib2recover.c does not have the CRC calculating functions, it just copies the CRC from the damaged files. However, I did manage to replicate the block calculator functionality in Python, to mark out the starting and ending bits of each block in a bz2 compressed file. Back on track, I have found that compress.c refers to some of the definitions in bzlib_private.h.

#define BZ_INITIALISE_CRC(crcVar) crcVar = 0xffffffffL;
#define BZ_FINALISE_CRC(crcVar) crcVar = ~(crcVar);
#define BZ_UPDATE_CRC(crcVar,cha)              \
{                                              \
   crcVar = (crcVar << 8) ^                    \
            BZ2_crc32Table[(crcVar >> 24) ^    \
                           ((UChar)cha)];      \
}

这些定义是由 bzlib.c 访问以及 S-GT&; blockCRC 初始化和更新 bzlib.c COM press.c 敲定。有超过2000行C code,这将需要一些时间来翻阅,并找出在发生什么,什么不。我加入 C 标签的问题为好。

These definitions are accessed by bzlib.c as well, s->blockCRC is initialized and updated in bzlib.c and finalized in compress.c. There's more than 2000 lines of C code, which will take some time to look through and figure out what goes in and what does not. I'm adding the C tag to the question as well.

顺便说一句,这里有bzip2的 HTTP C源码://www.bzip.org/1.0.6/bzip2-1.0.6.tar.gz

By the way, here are the C sources for bzip2 http://www.bzip.org/1.0.6/bzip2-1.0.6.tar.gz

更新3:

原来 bzlib2 块CRC32使用以下算法进行计算:

Turns out bzlib2 block CRC32 is calculated using the following algorithm:

DATAIN 是数据是EN codeD。

dataIn is the data to be encoded.

crcVar = 0xffffffff # Init
    for cha in list(dataIn):
        crcVar = crcVar & 0xffffffff # Unsigned
        crcVar = ((crcVar << 8) ^ (BZ2_crc32Table[(crcVar >> 24) ^ (ord(cha))]))

    return hex(~crcVar & 0xffffffff)[2:-1].upper()

在哪里BZ2_crc32Table在 crctable.c

有关 DATAIN =justatest的CRC返回的 7948C8CB ,其COM pressed一个文本文件与这些数据,CRC校验:在BZ2文件中32校验 79 48 C8 CB 这是一个匹配

For dataIn = "justatest" the CRC returned is 7948C8CB, having compressed a textfile with that data, the crc:32 checksum inside the bz2 file is 79 48 c8 cb which is a match.

结论:

bzlib2 CRC32是(报价 crctable.c

bzlib2 CRC32 is (quoting crctable.c)

从依稀code。通过衍生罗布
  沃诺克在的第51条
  comp.com pression常见问题...

Vaguely derived from code by Rob Warnock, in Section 51 of the comp.compression FAQ...

...因此,据我了解,不能pcalculated /使用标准的CRC32校验计算器验证$ P $,而是要求 bz2lib 执行(行155在-172 bzlib_private.h )。

...thus, as far as I understand, cannot be precalculated/validated using standard CRC32 checksum calculators, but rather require the bz2lib implementation (lines 155-172 in bzlib_private.h).

推荐答案

以下为的bzip2 使用的CRC算法,用Python写的:

The following is the CRC algorithm used by bzip2, written in Python:

crcVar = 0xffffffff # Init
    for cha in list(dataIn):
        crcVar = crcVar & 0xffffffff # Unsigned
        crcVar = ((crcVar << 8) ^ (BZ2_crc32Table[(crcVar >> 24) ^ (ord(cha))]))

    return hex(~crcVar & 0xffffffff)[2:-1].upper()

(C code定义可以在线路155-172中发现的 bzlib_private.h

BZ2_crc32Table 阵列/列表可以在 crctable.c bzip2的<发现/ code>源$ C ​​$ C。该CRC校验算法,报价:..依稀由罗布·沃诺克code来源,在comp.com pression常见问题解答第51条......的( crctable.c

BZ2_crc32Table array/list can be found in crctable.c from the bzip2 source code. This CRC checksum algorithm is, quoting: "..vaguely derived from code by Rob Warnock, in Section 51 of the comp.compression FAQ..." (crctable.c)

该校验和在 uncom $ P $计算pssed数据

源可以在这里下载: HTTP:// WWW。 bzip.org/1.0.6/bzip2-1.0.6.tar.gz

这篇关于计算/验证BZ2(bzip2的)CRC32在Python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆