Python CRC-32问题 [英] Python CRC-32 woes

查看:370
本文介绍了Python CRC-32问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个Python程序,用于从6 GB bz2文件的中间提取数据。 bzip2文件由可独立解密的数据块组成,因此我只需要找到一个块(它们由魔术位分隔),然后从内存中创建一个临时的单块bzip2文件,最后将其传递给bz2.decompress函数。容易,不是吗?

I'm writing a Python program to extract data from the middle of a 6 GB bz2 file. A bzip2 file is made up of independently decryptable blocks of data, so I only need to find a block (they are delimited by magic bits), then create a temporary one-block bzip2 file from it in memory, and finally pass that to the bz2.decompress function. Easy, no?

bzip2 格式最后是文件的crc32校验和。没问题,binascii.crc32即可解救。可是等等。要校验和的数据不一定要在字节边界上结束,crc32函数必须在整数个字节上运行。

The bzip2 format has a crc32 checksum for the file at the end. No problem, binascii.crc32 to the rescue. But wait. The data to be checksummed does not necessarily end on a byte boundary, and the crc32 function operates on a whole number of bytes.

我的计划:使用binascii.crc32函数除了最后一个字节以外的所有字节,然后是我自己的函数,以最后1–7位更新计算的crc。但是数小时的编码和测试使我迷惑不解,我的困惑可以归结为这个问题:crc32( \x00)为何不为0x00000000?根据Wikipedia的文章,不是吗?

My plan: Use the binascii.crc32 function on all but the last byte, and then a function of my own to update the computed crc with the last 1–7 bits. But hours of coding and testing has left me bewildered, and my puzzlement can be boiled down to this question: How come crc32("\x00") is not 0x00000000? Shouldn't it be, according to the Wikipedia article?

从0b00000000开始,填充32个0,然后用0x04C11DB7进行多项式除法,直到前8位没有剩余为止。您的最后32位是校验和,那怎么可能不是全零?

You start with 0b00000000 and pad with 32 0's, then do polynomial division with 0x04C11DB7 until there are no ones left in the first 8 bits, which is immediately. Your last 32 bits is the checksum, and how can that not be all zeroes?

我已经在Google上搜索了答案,并查看了几种CRC-32实现的代码

I've searched Google for answers and looked at the code of several CRC-32 implementations without finding any clue to why this is so.

推荐答案


为什么会出现crc32( \x00 )不是0x00000000?

how come crc32("\x00") is not 0x00000000?

基本CRC算法是将输入消息视为GF(2)中的多项式,除以

The basic CRC algorithm is to treat the input message as a polynomial in GF(2), divide by the fixed CRC polynomial, and use the polynomial remainder as the resulting hash.

CRC-32对基本算法进行了许多修改:

CRC-32 makes a number of modifications on the basic algorithm:


  1. 消息的每个字节中的位都反转。例如,字节0x01被视为多项式x ^ 7,而不是多项式x ^ 0。

  2. 该消息在右侧填充了32个零。

  3. 此反转和填充的消息的前4个字节与0xFFFFFFFF进行异或。

  4. 其余多项式被反转。

  5. 剩余多项式与0xFFFFFFFF进行异或。

  6. 并且请记住,CRC-32多项式的形式为非反转形式,为0x104C11DB7。

  1. The bits in each byte of the message is reversed. For example, the byte 0x01 is treated as the polynomial x^7, not as the polynomial x^0.
  2. The message is padded with 32 zeros on the right side.
  3. The first 4 bytes of this reversed and padded message is XOR'd with 0xFFFFFFFF.
  4. The remainder polynomial is reversed.
  5. The remainder polynomial is XOR'd with 0xFFFFFFFF.
  6. And recall that the CRC-32 polynomial, in non-reversed form, is 0x104C11DB7.

我们来计算一字节字符串0x00的CRC-32:

Let's work out the CRC-32 of the one-byte string 0x00:


  1. 消息:0x00

  2. 已反转:0x00

  3. 已填充:0x00 00 00 00 00

  4. 已异或:0xFF FF FF FF 00

  5. 除以0x104C11DB7所得的余数:0x4E 08 BF B4

  6. XOR'd:0xB1 F7 40 4B

  7. 反向:0xD2 02 EF 8D

  1. Message: 0x00
  2. Reversed: 0x00
  3. Padded: 0x00 00 00 00 00
  4. XOR'd: 0xFF FF FF FF 00
  5. Remainder when divided by 0x104C11DB7: 0x4E 08 BF B4
  6. XOR'd: 0xB1 F7 40 4B
  7. Reversed: 0xD2 02 EF 8D

在那里,您已经知道了:0x00的CRC-32是0xD202EF8D 。

(您应对此进行验证。)

And there you have it: The CRC-32 of 0x00 is 0xD202EF8D.
(You should verify this.)

这篇关于Python CRC-32问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆