md5散列的前32位是“随机”作为任何其他子字符串? [英] Are the first 32 bits of an md5 hash just as "random" as any other substring?

查看:181
本文介绍了md5散列的前32位是“随机”作为任何其他子字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找创建一些数据对象的32位散列。由于我不想编写我自己的散列函数,并且可以使用md5,我目前的方法是使用md5散列的前32位(即前8位十六进制数字)。这是可接受的吗?



换句话说,md5散列的前32位与任何其他子字符串一样是随机的吗?或有没有什么理由,我宁愿说,最后32位?或者也许将四个32位子串异或在一起?

一些先发制人的说明:


  • 这些散列不需要加密保护。

  • 我不关心md5的性能 - 它足够满足我的需求。
  • li>
  • 这些散列只需要足够随机,以避免冲突。

  • 在此系统中,项目数不应超过10,000它可能不会达到那么高的一半)。因此,在最糟糕的情况下,遇到任何碰撞的概率应该是大约1%(假设发现了足够的随机哈希)。

    解决方案

    对于任何好的散列函数,各个位应该大致是随机的。因此,您应该安全地使用MD5哈希的前32位。

    或者,您也可以使用CRC32,它应该更快计算(代码是约20行)。

    I'm looking to create a 32-bit hash of some data objects. Since I don't feel like writing my own hash function and md5 is available, my current approach is to use the first 32 bits (i.e. first 8 hex digits) from an md5 hash. Is this acceptable?

    In other words, are the first 32 bits of an md5 hash just as "random" as any other substring? Or is there any reason I'd prefer, say, the last 32 bits? or perhaps XOR'ing the four 32-bit substrings together?

    Some preemptive clarifications:

    • These hashes don't need to be cryptographically secure.
    • I'm not concerned with the performance of md5--it is more than fast enough for my needs.
    • These hashes just need to be "random" enough that collisions are rare.
    • In this system, the number of items shouldn't exceed 10,000 (realistically it's probably not going to get half that high). So in the worst case the probability of encountering any collisions at all should be about 1% (assuming a sufficiently "random" hash is found).

    解决方案

    For any good hash function the individual bits should be approximately random. You should therefore be safe to use just the first 32 bits of an MD5 hash.

    Alternatively you could also use CRC32 which should be much faster to compute (and the code is about 20 lines).

    这篇关于md5散列的前32位是“随机”作为任何其他子字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆