哈希冲突的可能性 [英] Probability of hash collision

查看:127
本文介绍了哈希冲突的可能性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在根据生日悖论寻找有关MD5,SHA1和SHA256发生碰撞的可能性的精确数学.

I am looking for some precise math on the likelihood of collisions for MD5, SHA1, and SHA256 based on the birthday paradox.

我正在寻找类似图形的内容,即如果您有10 ^ 8个键,这就是概率.如果您有10 ^ 13个键,则这是概率,依此类推"

I am looking for something like a graph that says "If you have 10^8 keys, this is the probability. If you have 10^13 keys, this is the probability and so on"

我看了很多文章,但是我很难找到可以提供这些数据的东西.(对我来说,理想的选择是公式或代码,以针对任何提供的哈希大小来计算此值)

I have looked at tons of articles but I am having a tough time finding something that gives me this data. (Ideal option for me would be a formula or code that calculates this for any provided hash size)

推荐答案

让我们想象一下,我们有一个真正的随机散列函数,它从字符串散列到n位数字.这意味着有2 n 个可能的哈希码,并且从所有这些可能性中随机地均匀选择每个字符串的哈希码.

Let's imagine we have a truly random hash function that hashes from strings to n-bit numbers. This means that there are 2n possible hash codes, and each string's hash code is chosen uniformly at random from all of those possibilities.

生日悖论特别指出,一旦您大致看到√(2k)个项目,发生碰撞的机率就有50%,其中k是不同的可能输出的数量.在哈希函数哈希到n位输出的情况下,这意味着在发生冲突之前,大约需要2 n/2 个哈希.这就是为什么我们通常选择输出256位的哈希值的原因.这意味着我们需要先散列2 128 ≈10 38 项,然后再进行合理的"处理.发生碰撞的机会.使用512位哈希,您大约需要2 256 才能获得50%的碰撞机会,而2 256

The birthday paradox specifically says that once you've seen roughly √(2k) items, there's a 50% chance of a collision, where k is the number of distinct possible outputs. In the case where the hash function hashes to an n-bit output, this means that you'll need roughly 2n/2 hashes before you get a collision. This is why we typically pick hashes that output 256 bits; it means that we'd need a staggering 2128 ≈1038 items hashed before there's a "reasonable" chance of a collision. With a 512-bit hash, you'd need about 2256 to get a 50% chance of a collision, and 2256 is approximately the number of protons in the known universe.

与n位哈希函数和k个字符串被哈希的冲突概率的精确公式为

The exact formula for the probability of getting a collision with an n-bit hash function and k strings hashed is

1-2- n !/(2 kn (2 n -k)!)

1 - 2n! / (2kn (2n - k)!)

这是一个非常棘手的数量,可以直接使用,但是我们可以使用表达式获得这个数量的近似值

This is a fairly tricky quantity to work with directly, but we can get a decent approximation of this quantity using the expression

1-e -k 2 /2 n + 1

1 - e-k2/2n+1

因此,要(大约)获得碰撞的概率p,我们可以求解以得到

So, to get (roughly) a probability p chance of a collision, we can solve to get

p≈1-e -k 2 /2 n + 1

p ≈ 1 - e-k2/2n+1

1-p≈e -k 2 /2 n + 1

1 - p ≈ e-k2/2n+1

ln(1- p)≈-k 2 /2 n + 1

ln(1 - p) ≈ -k2/2n+1

-ln(1- p)≈k 2 /2 n + 1

-ln(1 - p) ≈ k2/2n+1

-2 n + 1 ln(1-p)≈k 2

-2n+1 ln(1 - p) ≈ k2

2 (n + 1)/2 √(-ln(1-p))≈k

2(n+1)/2 √(-ln(1 - p)) ≈ k

作为最后一个近似值,假设我们正在处理p的非常个小选择.然后ln(1- p)≈-p,因此我们可以将其重写为

As one last approximation, assume we're dealing with very small choices of p. Then ln(1 - p) ≈ -p, so we can rewrite this as

k≈2 (n + 1)/2 √p

请注意,这里还有2个(n + 1)/2 项,因此对于256位哈希,其前导项为2 128.5 ,即只是巨大的.例如,为了看到2 -50 与256位哈希值发生冲突的机会,我们必须看到多少个项目?那大概是

Notice that there's still a monster 2(n+1)/2 term here, so for a 256-bit hash that leading term is 2128.5, which is just enormous. For example, how many items must we see to get a 2-50 chance of a collision with a 256-bit hash? That would be approximately

2 (256 + 1)/2 √2 -50

= 2 257/2 2 -50/2

= 2 207/2

= 2 153.5 .

因此,您需要大量的 散列,以使消失的发生碰撞的机会很小.如图所示,2 153.5 约为10 45 ,每计算散列值1纳秒,所需时间将比要计算的Universe更长.毕竟,您将获得2 -50 的成功概率,大约为10 -15 .

So you'd need a staggeringly huge number of hashes to have a vanishingly small chance of getting a collision. Figure that 2153.5 is about 1045, which at one nanosecond per hash computed would take you longer than the length of the universe to compute. And after all that, you'd get a success probability of 2-50, which is about 10-15.

实际上,这就是为什么我们选择如此大量的哈希作为哈希值的原因!这使得发生碰撞的可能性极小.

In fact, this precisely why we pick such large numbers of bits for our hashes! It makes it extremely unlikely for a collision to occur by chance.

(请注意,我们今天使用的哈希函数实际上并不是真正的随机函数,这就是为什么人们建议不要使用MD5,SHA1以及其他暴露出安全性弱点的原因.)

(Note that the hash functions we have today aren't actually truly random functions, which is why people advise against using MD5, SHA1, and others that have had security weaknesses exposed.)

希望这会有所帮助!

这篇关于哈希冲突的可能性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆