如何测试散列函数? [英] How to test a hash function?

查看:107
本文介绍了如何测试散列函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有一种方法来测试一个哈希函数的质量?我想有一个良好的S $ P $垫在哈希表中使用时,这将是巨大的,如果这是verifyable在单元测试。

Is there a way to test the quality of a hash function? I want to have a good spread when used in the hash table, and it would be great if this is verifyable in a unit test.

修改:为了澄清,我的问题是,我已经以这种方式使用值在Java中,第一个32位的EN codeD的ID和第二个32位的EN codeD另一个ID。不幸的是长值的Java的哈希只是异或第32位与第二个32位,这在我的情况下,导致性能非常差在的HashMap 使用时。所以我需要一个不同的hash,并希望有一个单元测试,使这个问题不能在更多的蠕动。

EDIT: For clarification, my problem was that I have used long values in Java in such a way that the first 32 bit encoded an ID and the second 32 bit encoded another ID. Unfortunately Java's hash of long values just XORs the first 32 bit with the second 32 bits, which in my case led to very poor performance when used in a HashMap. So I need a different hash, and would like to have a Unit Test so that this problem cannot creep in any more.

推荐答案

您可以选择使用从你期望它的工作在相同(或相似)分布中抽取数据,以测试你的哈希函数。当看到在64位长哈希函数,默认的Java哈希函数是极好,如果输入的值是从所有可能的long值均匀地引出。

You have to test your hash function using data drawn from the same (or similar) distribution that you expect it to work on. When looking at hash functions on 64-bit longs, the default Java hash function is excellent if the input values are drawn uniformly from all possible long values.

不过,你提到你的应用程序使用长来存储本质上是两个独立的32位值。尝试生成类似于你想到居然使用的这个值的样品,然后用该测试。

However, you've mentioned that your application uses the long to store essentially two independent 32-bit values. Try to generate a sample of values similar to the ones you expect to actually use, and then test with that.

有关测试本身,把你的采样输入值,哈希每一个并把结果为一组。算所得的集合的大小,并将其与输入集的大小,而这将告诉你碰撞的散列函数生成的数目。

For the test itself, take your sample input values, hash each one and put the results into a set. Count the size of the resulting set and compare it to the size of the input set, and this will tell you the number of collisions your hash function is generating.

有关特定的应用程序,而不是简单地异或在一起,尝试的方式典型好的哈希函数结合32位值将合并两个indepenet整数。即由原乘,并添加。

For your particular application, instead of simply XORing them together, try combining the 32-bit values in ways a typical good hash function would combine two indepenet ints. I.e. multiply by a prime, and add.

这篇关于如何测试散列函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆