在Python和Java中散列原始字节会产生不同的结果 [英] Hashing raw bytes in Python and Java produces different results

查看:89
本文介绍了在Python和Java中散列原始字节会产生不同的结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在Java中复制Python 2.7函数的行为,但是通过SHA-256哈希运行(看似)相同的字节序列时,却得到了不同的结果.通过以特定方式(我的Python代码示例的第二行)处理一个非常大的整数(恰好为2048位)来生成字节.

I'm trying to replicate the behavior of a Python 2.7 function in Java, but I'm getting different results when running a (seemingly) identical sequence of bytes through a SHA-256 hash. The bytes are generated by manipulating a very large integer (exactly 2048 bits long) in a specific way (2nd line of my Python code example).

在我的示例中,原始的2048位整数在Python和Java中分别存储为big_intbigInt,并且两个变量包含相同的数字.

For my examples, the original 2048-bit integer is stored as big_int and bigInt in Python and Java respectively, and both variables contain the same number.

我要复制的Python2代码:

Python2 code I'm trying to replicate:

raw_big_int = ("%x" % big_int).decode("hex")

buff = struct.pack(">i", len(raw_big_int) + 1) + "\x00" + raw_big_int

pprint("Buffer contains: " + buff)
pprint("Encoded: " + buff.encode("hex").upper())

digest = hashlib.sha256(buff).digest()

pprint("Digest contains: " + digest)
pprint("Encoded: " + digest.encode("hex").upper())

运行此代码将打印以下内容(请注意,我真正感兴趣的唯一结果是最后一个-十六进制编码的摘要.其他3张照片只是为了查看发生了什么情况在幕后):

Running this code prints the following (note that the only result I'm actually interested in is the last one - the hex-encoded digest. The other 3 prints are just to see what's going on under the hood):

'Buffer contains: \x00\x00\x01\x01\x00\xe3\xbb\xd3\x84\x94P\xff\x9c\'\xd0P\xf2\xf0s,a^\xf0i\xac~\xeb\xb9_\xb0m\xa2&f\x8d~W\xa0\xb3\xcd\xf9\xf0\xa8\xa2\x8f\x85\x02\xd4&\x7f\xfc\xe8\xd0\xf2\xe2y"\xd0\x84ck\xc2\x18\xad\xf6\x81\xb1\xb0q\x19\xabd\x1b>\xc8$g\xd7\xd2g\xe01\xd4r\xa3\x86"+N\\\x8c\n\xb7q\x1c \x0c\xa8\xbcW\x9bt\xb0\xae\xff\xc3\x8aG\x80\xb6\x9a}\xd9*\x9f\x10\x14\x14\xcc\xc0\xb6\xa9\x18*\x01/eC\x0eQ\x1b]\n\xc2\x1f\x9e\xb6\x8d\xbfb\xc7\xce\x0c\xa1\xa3\x82\x98H\x85\xa1\\\xb2\xf1\'\xafmX|\x82\xe7%\x8f\x0eT\xaa\xe4\x04*\x91\xd9\xf4e\xf7\x8c\xd6\xe5\x84\xa8\x01*\x86\x1cx\x8c\xf0d\x9cOs\xebh\xbc1\xd6\'\xb1\xb0\xcfy\xd7(\x8b\xeaIf6\xb4\xb7p\xcdgc\xca\xbb\x94\x01\xb5&\xd7M\xf9\x9co\xf3\x10\x87U\xc3jB3?vv\xc4JY\xc9>\xa3cec\x01\x86\xe9c\x81F-\x1d\x0f\xdd\xbf\xe8\xe9k\xbd\xe7c5'
'Encoded: 0000010100E3BBD3849450FF9C27D050F2F0732C615EF069AC7EEBB95FB06DA226668D7E57A0B3CDF9F0A8A28F8502D4267FFCE8D0F2E27922D084636BC218ADF681B1B07119AB641B3EC82467D7D267E031D472A386222B4E5C8C0AB7711C200CA8BC579B74B0AEFFC38A4780B69A7DD92A9F101414CCC0B6A9182A012F65430E511B5D0AC21F9EB68DBF62C7CE0CA1A382984885A15CB2F127AF6D587C82E7258F0E54AAE4042A91D9F465F78CD6E584A8012A861C788CF0649C4F73EB68BC31D627B1B0CF79D7288BEA496636B4B770CD6763CABB9401B526D74DF99C6FF3108755C36A42333F7676C44A59C93EA36365630186E96381462D1D0FDDBFE8E96BBDE76335'
'Digest contains: Q\xf9\xb9\xaf\xe1\xbey\xdc\xfa\xc4.\xa9 \xfckz\xfeB\xa0>\xb3\xd6\xd0*S\xff\xe1\xe5*\xf0\xa3i'
'Encoded: 51F9B9AFE1BE79DCFAC42EA920FC6B7AFE42A03EB3D6D02A53FFE1E52AF0A369'

现在,以下是我到目前为止的Java代码.测试时,输入缓冲区的值相同,但摘要的值不同. (bigInt包含一个BigInteger对象,该对象的编号与上述Python示例中的big_int相同)

Now, below is my Java code so far. When I test it, I get the same value for the input buffer, but a different value for the digest. (bigInt contains a BigInteger object containing the same number as big_int in the Python example above)

byte[] rawBigInt = bigInt.toByteArray();

ByteBuffer buff = ByteBuffer.allocate(rawBigInt.length + 4);
buff.order(ByteOrder.BIG_ENDIAN);
buff.putInt(rawBigInt.length).put(rawBigInt);

System.out.print("Buffer contains: ");
System.out.println( DatatypeConverter.printHexBinary(buff.array()) );


MessageDigest hash = MessageDigest.getInstance("SHA-256");
hash.update(buff);
byte[] digest = hash.digest();

System.out.print("Digest contains: ");
System.out.println( DatatypeConverter.printHexBinary(digest) );

请注意,在我的Python示例中,我以len(raw_big_int) + 1打包开始缓冲,而在Java中,我只是以rawBigInt.length开始.用Java编写时,我还省略了多余的0字节("\x00").我出于相同的原因执行了这两个操作-在我的测试中,在BigInteger上调用toByteArray()返回了byte数组已经从0字节开始的数组,该数组刚好比1字节长Python的字节序列.因此,至少在我的测试中,len(raw_big_int) + 1等于rawBigInt.length,因为rawBigInt以0字节开头,而raw_big_int不是.

Notice that in my Python example, I started the buffer off with len(raw_big_int) + 1 packed, where in Java I started with just rawBigInt.length. I also omitted the extra 0-byte ("\x00") when writing in Java. I did both of these for the same reason - in my tests, calling toByteArray() on a BigInteger returned a byte array already beginning with a 0-byte that was exactly 1 byte longer than Python's byte sequence. So, at least in my tests, len(raw_big_int) + 1 equaled rawBigInt.length, since rawBigInt began with a 0-byte and raw_big_int did not.

好的,此外,这是Java代码的输出:

Alright, that aside, here is the Java code's output:

Buffer contains: 0000010100E3BBD3849450FF9C27D050F2F0732C615EF069AC7EEBB95FB06DA226668D7E57A0B3CDF9F0A8A28F8502D4267FFCE8D0F2E27922D084636BC218ADF681B1B07119AB641B3EC82467D7D267E031D472A386222B4E5C8C0AB7711C200CA8BC579B74B0AEFFC38A4780B69A7DD92A9F101414CCC0B6A9182A012F65430E511B5D0AC21F9EB68DBF62C7CE0CA1A382984885A15CB2F127AF6D587C82E7258F0E54AAE4042A91D9F465F78CD6E584A8012A861C788CF0649C4F73EB68BC31D627B1B0CF79D7288BEA496636B4B770CD6763CABB9401B526D74DF99C6FF3108755C36A42333F7676C44A59C93EA36365630186E96381462D1D0FDDBFE8E96BBDE76335
Digest contains: E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855

如您所见,缓冲区内容在Python和Java中都相同,但是摘要明显不同.有人可以指出我要去哪里了吗?

As you can see, the buffer contents appear the same in both Python and Java, but the digests are obviously different. Can someone point out where I'm going wrong?

我怀疑这与Python似乎存储字节的奇怪方式有关-变量raw_big_intbuff在解释器中显示为类型str,当自己打印出来时,其格式很奇怪在某些地方与字节本身几乎相同的'\ x',但在其他地方则完全是乱码.我没有足够的Python经验,无法确切了解这里发生的事情,而且我的搜索无济于事.

I suspect it has something to do with the strange way Python seems to store bytes - the variables raw_big_int and buff show as type str in the interpreter, and when printed out by themselves have that strange format with the '\x's that is almost the same as the bytes themselves in some places, but is utter gibberish in others. I don't have enough Python experience to understand exactly what's going on here, and my searches have turned up fruitless.

此外,由于我试图将Python代码移植到Java中,所以我不能只更改Python-我的目标是编写具有相同输入并产生相同输出的Java代码.我搜索了(这个问题特别是似乎相关),但没有找到任何帮助我的方法.在此先感谢您,除了阅读这个冗长的问题之外,别无其他! :)

Also, since I'm trying to port the Python code into Java, I can't just change the Python - my goal is to write Java code that takes the same input and produces the same output. I've searched around (this question in particular seemed related) but didn't find anything to help me out. Thanks in advance, if for nothing else than for reading this long-winded question! :)

推荐答案

在Java中,您已经将数据存储在缓冲区中,但是光标位置都是错误的.将数据写入ByteBuffer后,它看起来像这样,其中 x 代表您的数据,而 0 则是缓冲区中未写入的字节:

In Java, you've got the data in the buffer, but the cursor positions are all wrong. After you've written your data to the ByteBuffer it looks like this, where the x's represent your data and the 0's are unwritten bytes in the buffer:

xxxxxxxxxxxxxxxxxxxx00000000000000000000000000000000000000000
                    ^ position                               ^ limit

光标位于您写入的数据之后.此时的读操作将从position读到limit,这是您尚未写入的字节.

The cursor is positioned after the data you've written. A read at this point will read from position to limit, which is the bytes you haven't written.

相反,您需要这样做:

xxxxxxxxxxxxxxxxxxxx00000000000000000000000000000000000000000
^ position          ^ limit

,位置为0,限制为您写入的字节数.要到达那里,请调用 flip() .翻转缓冲区从概念上将其从写入模式切换到读取模式.我说从概念上讲"是因为ByteBuffers没有显式的读写模式,但是您应该把它们当作是它们.

where the position is 0 and the limit is the number of bytes you've written. To get there, call flip(). Flipping a buffer conceptually switches it from write mode to read mode. I say "conceptually" because ByteBuffers don't have explicit read and write modes, but you should think of them as if they do.

(相反的操作是 compact() ,返回到阅读模式.)

(The opposite operation is compact(), which goes back to read mode.)

这篇关于在Python和Java中散列原始字节会产生不同的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆