128位散列的任何64位部分是否作为64位散列的防冲突? [英] Is any 64-bit portion of a 128-bit hash as collision-proof as a 64-bit hash?

查看:224
本文介绍了128位散列的任何64位部分是否作为64位散列的防冲突?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在尝试解决我们开发团队的内部辩论:



我们正在寻找一个64位的PHP哈希函数。我们发现了MurmurHash3的 PHP实现,但MurmurHash3是32位或128位,而不是64位。



同事#1认为要从MurmurHash3生成64位散列,我们可以简单地将128位散列的第一个(或最后一个或任意)64位并且它将作为本地64位散列函数的防冲突。



同事#2认为我们必须找到一个原生的64位散列函数以减少冲突,并且128位散列的64位片不会像本地64位散列一样具有冲突性。



谁是正确的?



如果我们取第一个(或最后一个或任何)64位加密散列(如SHA1而不是Murmur3),答案是否会改变?


<如果你有真实的随机,均匀分布的值,那么切片将产生完全相同的结果,如果你从一开始就从较小的值开始。看看为什么,考虑这个非常简单的例子:假设你的随机发生器输出3个随机位,但你只需要一个随机位。让我们假设输出

  b1 b2 b3 

可能的值为

  000,001,010,011,100,101 ,110,111 

,所有都以1/8的概率发生。现在无论你从那三个为你的目的切割的位 - 第一,第二或第三 - 有一个'1'的概率总是为1/2,不管位置 - 并且同样是真实的0 '。



您可以轻松地将此实验扩展到64位128位的情况:无论切片哪些位,结束为一个或零的概率在某个位置将是一半。这意味着,如果 你有一个从均匀分布的随机变量取的样本,则切片不会使冲突的概率更大或更小。



现在一个好的问题是随机函数是否真的是我们可以做的最好的防止冲突。



加密哈希函数:co-worker#1胜过

/ h3>

现实生活中的问题是哈希函数不是随机的,相反,它们是非常确定的。但是密码散列函数的设计目标如下:如果我们不知道它们的初始状态,则它们的输出将与真实的随机函数在计算上不可区分,即没有计算有效的方式来区分散列输出和实随机值。这就是为什么你会认为一个散列已经是一种破碎,如果你可以找到一个区分器,一种方法来告诉散列从真实随机值的概率高于一半。不幸的是,我们不能真正地证明这些属性为现有的加密散列,但除非有人打破他们,我们可以假设这些属性保持有一定的信心。以下是关于某个SHA-3提交的识别者的论文的示例这说明了过程。



总而言之,除非找到给定加密散列的区分符,否则切片是完全正确的,不会增加冲突的概率。



非加密散列函数:同事#2可能获胜



非加密散列不必满足同一组要求作为加密散列。他们通常被定义为非常快,并满足某些属性在善良/仁慈的条件,但如果有人试图恶意操纵他们可能很容易缺乏。在实践中这意味着一个很好的例子是对哈希表实现的计算复杂性攻击( hashDoS )。在正常条件下,非加密哈希工作得很好,但是它们的抗冲突性可能会被一些聪明的输入严重地破坏。这不能发生在加密哈希函数,因为他们的定义要求他们免受各种聪明的输入。



因为有可能,有时甚至很容易,找到像上面那样的用于输出非加密散列的区分符,我们可以立即说它们不符合加密哈希函数。能够说出差异意味着某处在输出中存在模式或偏差。



这个事实单独意味着它们或多或少地偏离了随机函数,因此(在我们上面说过)之后,碰撞可能比它们随机函数。最后,由于碰撞发生的概率更高,已经有128位,这将不会得到更好的较短的输出,碰撞可能会更可能在这种情况下。



tl; dr 截断时,使用加密哈希函数是安全的。但是你最好使用一个本地64位加密散列函数,相比于将非密码散列截断到更大的输出到64位。


We're trying to settle an internal debate on our dev team:

We're looking for a 64-bit PHP hash function. We found a PHP implementation of MurmurHash3, but MurmurHash3 is either 32-bit or 128-bit, not 64-bit.

Co-worker #1 believes that to produce a 64-bit hash from MurmurHash3, we can simply slice the first (or last, or any) 64 bits of the 128-bit hash and that it will be as collision-proof as a native 64-bit hash function.

Co-worker #2 believes that we must find a native 64-bit hash function to reduce collisions and that 64-bit slices of a 128-bit hash will not be as collision proof as a native 64-bit hash.

Who's correct?

Does the answer change if we take the first (or last, or any) 64-bits of a cryptographic hash like SHA1 instead of Murmur3?

解决方案

If you had real random, uniformly distributed values, then "slicing" would yield exactly the same results as if you had started with the smaller value right from the start. To see why, consider this very simple example: Let's say your random generator outputs 3 random bits, but you only need one random bit to work with. Let's assume the output is

b1 b2 b3

The possible values are

000, 001, 010, 011, 100, 101, 110, 111

and all are to occur with equal probability of 1/8. Now whatever bit you slice from those three for your purpose - the first, second or third - the probability of having a '1' is always going to be 1/2, regardless of the position - and the same is true for a '0'.

You can easily scale this experiment to the 64 out of 128 bit case: regardless of which bits you slice, the probability of ending up with a one or a zero in a certain position is going to be one half. What this means is that if you had a sample taken from a uniformly distributed random variable, then slicing wouldn't make the probability for collisions more or less likely.

Now a good question is whether random functions are really the best we can do to prevent collisions. But as it turns out, it can be shown that the probability of finding collisions increases whenever a function deviates from random.

Cryptographic hash functions: co-worker #1 wins

The problem in real life is that hash functions are not random at all, on the contrary, they are boringly deterministic. But a design goal of cryptographic hash functions is as follows: if we didn't know their initial state, then their output would be computationally indistinguishable from a real random function, that is there's no computationally efficient way to tell the difference between the hash output and real random values. This is why you'd consider a hash already as kind of broken if you can find a "distinguisher", a method to tell the hash from real random values with a probability higher than one half. Unfortunately, we can't really prove these properties for existing cryptographic hashes, but unless somebody breaks them, we may assume these properties hold with some confidence. Here is an example of a paper about a distinguisher for one of the SHA-3 submissions that illustrates the process.

To summarize, unless a distinguisher is found for a given cryptographic hash, slicing is perfectly fine and does not increase the probability of a collision.

Non-cryptographic hash functions: co-worker #2 might win

Non-cryptographic hashes do not have to satisfy the same set of requirements as cryptographic hashes do. They are usually defined to be very fast and satisfy certain properties "under sane/benevolent conditions", but they might easily fall short if somebody tries to maliciously manipulate them. A good example for what this means in practice is the computational complexity attack on hash table implementations (hashDoS) presented earlier this year. Under normal conditions, non-crypto hashes work perfectly fine, but their collision resistance may be severely undermined by some clever inputs. This can't happen with cryptographic hash functions, because their very definition requires them to be immune to all sorts of clever inputs.

Because it is possible, sometimes even quite easy, to find a distinguisher like above for the output of non-cryptographic hashes, we can immediately say that they do not qualify as cryptographic hash functions. Being able to tell the difference means that somewhere there is a pattern or bias in the output.

And this fact alone implies that they deviate more or less from a random function, and thus (after what we said above) collisions are probably more likely than they would be for random functions. Finally, since collisions occur with higher probability for the full 128 bits already, this will not get better with shorter ouptputs, collisions will probably be even more likely in that case.

tl;dr You're safe with a cryptographic hash function when truncating it. But you're better off with a "native" 64 bit cryptographic hash function compared to truncating a non-cryptographic hash with a larger output to 64 bits.

这篇关于128位散列的任何64位部分是否作为64位散列的防冲突?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆