128 位散列的任何 64 位部分是否与 64 位散列一样防冲突? [英] Is any 64-bit portion of a 128-bit hash as collision-proof as a 64-bit hash?

查看:21
本文介绍了128 位散列的任何 64 位部分是否与 64 位散列一样防冲突?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在努力解决我们开发团队的内部争论:

We're trying to settle an internal debate on our dev team:

我们正在寻找一个 64 位 PHP 哈希函数.我们找到了 MurmurHash3 的 PHP 实现,但 MurmurHash3 是 32 位或 128 位,而不是 64-位.

We're looking for a 64-bit PHP hash function. We found a PHP implementation of MurmurHash3, but MurmurHash3 is either 32-bit or 128-bit, not 64-bit.

同事 #1 认为,要从 MurmurHash3 生成 64 位散列,我们可以简单地对 128 位散列的第一个(或最后一个,或任何一个)64 位进行切片,并且它将是防冲突的作为原生 64 位哈希函数.

Co-worker #1 believes that to produce a 64-bit hash from MurmurHash3, we can simply slice the first (or last, or any) 64 bits of the 128-bit hash and that it will be as collision-proof as a native 64-bit hash function.

同事 #2 认为我们必须找到一个原生 64 位散列函数来减少冲突,并且 128 位散列的 64 位切片不会像原生 64 位散列那样防冲突.

Co-worker #2 believes that we must find a native 64-bit hash function to reduce collisions and that 64-bit slices of a 128-bit hash will not be as collision proof as a native 64-bit hash.

谁是对的?

如果我们采用 SHA1 等加密哈希的第一个(或最后一个,或任何一个)64 位而不是 Murmur3,答案是否会改变?

Does the answer change if we take the first (or last, or any) 64-bits of a cryptographic hash like SHA1 instead of Murmur3?

推荐答案

如果你有真正的随机、均匀分布的值,那么切片"将产生完全相同的结果,就好像你从一开始就使用较小的值一样.要了解原因,请考虑这个非常简单的示例:假设您的随机生成器输出 3 个随机位,但您只需要一个随机位即可使用.假设输出是

If you had real random, uniformly distributed values, then "slicing" would yield exactly the same results as if you had started with the smaller value right from the start. To see why, consider this very simple example: Let's say your random generator outputs 3 random bits, but you only need one random bit to work with. Let's assume the output is

b1 b2 b3

可能的值是

000, 001, 010, 011, 100, 101, 110, 111

所有发生的概率都是 1/8.现在,无论出于您的目的从这三个中切出什么位 - 第一个,第二个或第三个 - 无论位置如何,拥有1"的概率始终是 1/2 - 对于0"也是如此'.

and all are to occur with equal probability of 1/8. Now whatever bit you slice from those three for your purpose - the first, second or third - the probability of having a '1' is always going to be 1/2, regardless of the position - and the same is true for a '0'.

您可以轻松地将此实验扩展到 128 位中的 64 位情况:无论您切片哪些位,在某个位置以 1 或 0 结束的概率都是二分之一.这意味着如果你有一个从均匀分布的随机变量中提取的样本,那么切片不会增加或减少发生碰撞的可能性.

You can easily scale this experiment to the 64 out of 128 bit case: regardless of which bits you slice, the probability of ending up with a one or a zero in a certain position is going to be one half. What this means is that if you had a sample taken from a uniformly distributed random variable, then slicing wouldn't make the probability for collisions more or less likely.

现在一个很好的问题是,随机函数是否真的是我们能做的最好的防止碰撞的方法.但事实证明,只要函数偏离随机值,发现碰撞的概率就会增加.

Now a good question is whether random functions are really the best we can do to prevent collisions. But as it turns out, it can be shown that the probability of finding collisions increases whenever a function deviates from random.

现实生活中的问题是哈希函数根本不是随机的,相反,它们是无聊的确定性.但是密码散列函数的设计目标如下:如果我们不知道它们的初始状态,那么它们的输出在计算上将与真正的随机函数无法区分,也就是说,没有计算上有效的方法来区分散列输出和真正的随机值.这就是为什么如果你能找到一个鉴别器",你会认为散列已经被破坏了,这是一种将散列与概率高于一半的真实随机值区分开来的方法.不幸的是,我们无法真正证明现有加密哈希的这些属性,但除非有人破解它们,否则我们可以假设这些属性具有一定的信心.以下是关于 SHA-3 提交之一的区分器的论文示例这说明了这个过程.

The problem in real life is that hash functions are not random at all, on the contrary, they are boringly deterministic. But a design goal of cryptographic hash functions is as follows: if we didn't know their initial state, then their output would be computationally indistinguishable from a real random function, that is there's no computationally efficient way to tell the difference between the hash output and real random values. This is why you'd consider a hash already as kind of broken if you can find a "distinguisher", a method to tell the hash from real random values with a probability higher than one half. Unfortunately, we can't really prove these properties for existing cryptographic hashes, but unless somebody breaks them, we may assume these properties hold with some confidence. Here is an example of a paper about a distinguisher for one of the SHA-3 submissions that illustrates the process.

总而言之,除非为给定的加密哈希找到区分符,否则切片是完全可以的,并且不会增加冲突的可能性.

To summarize, unless a distinguisher is found for a given cryptographic hash, slicing is perfectly fine and does not increase the probability of a collision.

非加密哈希不必满足与加密哈希相同的一组要求.它们通常被定义为非常快并且在理智/仁慈的条件下"满足某些属性,但如果有人试图恶意操纵它们,它们可能很容易达不到要求.这在实践中意味着什么的一个很好的例子是对哈希表实现的计算复杂性攻击(hashDoS) 在今年早些时候提出.在正常情况下,非加密哈希工作得非常好,但它们的抗碰撞性可能会被一些聪明的输入严重破坏.加密散列函数不会发生这种情况,因为它们的定义要求它们不受各种巧妙输入的影响.

Non-cryptographic hashes do not have to satisfy the same set of requirements as cryptographic hashes do. They are usually defined to be very fast and satisfy certain properties "under sane/benevolent conditions", but they might easily fall short if somebody tries to maliciously manipulate them. A good example for what this means in practice is the computational complexity attack on hash table implementations (hashDoS) presented earlier this year. Under normal conditions, non-crypto hashes work perfectly fine, but their collision resistance may be severely undermined by some clever inputs. This can't happen with cryptographic hash functions, because their very definition requires them to be immune to all sorts of clever inputs.

因为有可能,有时甚至很容易,为非加密哈希的输出找到像上面这样的区分符,我们可以立即说它们不符合加密哈希函数的条件.能够分辨出差异意味着输出中的某个地方存在模式或偏差.

Because it is possible, sometimes even quite easy, to find a distinguisher like above for the output of non-cryptographic hashes, we can immediately say that they do not qualify as cryptographic hash functions. Being able to tell the difference means that somewhere there is a pattern or bias in the output.

仅这一事实就意味着它们或多或少地偏离了随机函数,因此(根据我们上面所说的)碰撞可能比随机函数更有可能发生.最后,由于在完整的 128 位中已经发生冲突的概率较高,因此较短的输出不会变得更好,在这种情况下冲突的可能性更大.

And this fact alone implies that they deviate more or less from a random function, and thus (after what we said above) collisions are probably more likely than they would be for random functions. Finally, since collisions occur with higher probability for the full 128 bits already, this will not get better with shorter ouptputs, collisions will probably be even more likely in that case.

tl;dr 截断加密散列函数是安全的.但是,与将具有较大输出的非加密哈希截断为 64 位相比,使用本机"64 位加密哈希函数会更好.

tl;dr You're safe with a cryptographic hash function when truncating it. But you're better off with a "native" 64 bit cryptographic hash function compared to truncating a non-cryptographic hash with a larger output to 64 bits.

这篇关于128 位散列的任何 64 位部分是否与 64 位散列一样防冲突?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆