md5散列的前32位是“随机”作为任何其他子字符串? [英] Are the first 32 bits of an md5 hash just as "random" as any other substring?
问题描述
我正在寻找创建一些数据对象的32位散列。由于我不想编写我自己的散列函数,并且可以使用md5,我目前的方法是使用md5散列的前32位(即前8位十六进制数字)。这是可接受的吗?
换句话说,md5散列的前32位与任何其他子字符串一样是随机的吗?或有没有什么理由,我宁愿说,最后32位?或者也许将四个32位子串异或在一起?
一些先发制人的说明:
对于任何好的散列函数,各个位应该大致是随机的。因此,您应该安全地使用MD5哈希的前32位。
或者,您也可以使用CRC32,它应该更快计算(代码是约20行)。
I'm looking to create a 32-bit hash of some data objects. Since I don't feel like writing my own hash function and md5 is available, my current approach is to use the first 32 bits (i.e. first 8 hex digits) from an md5 hash. Is this acceptable?
In other words, are the first 32 bits of an md5 hash just as "random" as any other substring? Or is there any reason I'd prefer, say, the last 32 bits? or perhaps XOR'ing the four 32-bit substrings together?
Some preemptive clarifications:
- These hashes don't need to be cryptographically secure.
- I'm not concerned with the performance of md5--it is more than fast enough for my needs.
- These hashes just need to be "random" enough that collisions are rare.
- In this system, the number of items shouldn't exceed 10,000 (realistically it's probably not going to get half that high). So in the worst case the probability of encountering any collisions at all should be about 1% (assuming a sufficiently "random" hash is found).
For any good hash function the individual bits should be approximately random. You should therefore be safe to use just the first 32 bits of an MD5 hash.
Alternatively you could also use CRC32 which should be much faster to compute (and the code is about 20 lines).
这篇关于md5散列的前32位是“随机”作为任何其他子字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!