获得与此哈希码函数的HashCode冲突的可能性有多大? [英] How likely is it to get a HashCode collision with this hashcode function?

查看:151
本文介绍了获得与此哈希码函数的HashCode冲突的可能性有多大?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


  1. 对于键[0]的随机int值,它在多大程度上获得与下面的函数的HashCode冲突],键[1],键[2],键[3]

  2. 带有以下约束的随机键值

    • 键[0]< 1,000,000

    • 键[1]< 10,000

    • 键[2]< 1,000

    • 键[3]< 1,000


假设我们有1000万

  int [] key = new int [4]; 
public override int GetHashCode()
{
//使用大的素数来创建唯一的哈希键
//使用偶数幂减去2创建散列偏移量方法,这给了
//大部分时间。
int hashKey = 0;
hashKey + = 2047 * key [0];
hashKey + = 8191 * key [1];
hashKey + = 32767 * key [2];
hashKey + = 131071 * key [3];
return hashKey;
}


解决方案

我写了一个快速脚本

  import random 

def hash(key):
hashKey = 0
hashKey + = 2047 * key [0]
hashKey + = 8191 * key [1]
hashKey + = 32767 * key [2]
hashKey + = 131071 * key [ 3]
return hashKey

seen = set()
collisions = 0
for i in range(0,10000000):
x = hash([ random.randint(0,1000000),random.randint(0,10000),random.randint(0,1000),random.randint(0,1000)])
如果看到x:
碰撞+ = 1
else:
seen.add(x)

打印冲突

当我运行它告诉我我有23735次碰撞。
我也尝试了一百万个元素,我有247个碰撞。这两个数字是4次以上的平均值。


How likely is it to get a HashCode collision with the function below in following scenarios.

  1. With random int values for key[0],key[1], key[2], key[3]
  2. With random key values with the following constraints
    • key[0] <1,000,000
    • key[1] <10,000
    • key[2] <1,000
    • key[3] <1,000

Assume we have 10 Million objects.

int[] key=new int[4];    
public override int GetHashCode()
{
    // Use large prime multiples to create a unique hash key
    // Create the hash offsets using a "even powers of 2 minus 1" method, which gives 
    // primes most of the time.  
    int hashKey = 0;
    hashKey += 2047 * key[0];
    hashKey += 8191 * key[1];
    hashKey += 32767 * key[2];
    hashKey += 131071 * key[3];
    return hashKey;
}

解决方案

I wrote a quick script to test this.

import random

def hash(key):
    hashKey = 0
    hashKey += 2047 * key[0]
    hashKey += 8191 * key[1]
    hashKey += 32767 * key[2]
    hashKey += 131071 * key[3]
    return hashKey

seen = set()
collisions = 0
for i in range(0,10000000):
    x = hash([random.randint(0,1000000), random.randint(0,10000), random.randint(0,1000), random.randint(0,1000)])
    if x in seen:
        collisions += 1
    else:
        seen.add(x)

print collisions

When I ran it, it told me I got 23735 collisions. I also tried it on one million elements, and I got 247 collisions. Both numbers are averages over 4 runs.

这篇关于获得与此哈希码函数的HashCode冲突的可能性有多大?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆