散列用于散列表的字符串(Double Hashing) [英] Hashing a string for use in hash table (Double Hashing)

查看:266
本文介绍了散列用于散列表的字符串(Double Hashing)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Double Hashing将一个String键散列到一个散列表中。我做了这样的事情:

  protected int getIndex(String key){
int itr = 0,
size = this.values.length,
index1,
index2,
index = 0;

do {
//做双重哈希以获取curr [itr](iteration)的索引
index1 = Math.abs(key.hashCode())%size;
index2 = size - ((key + key +#!@)。hashCode()%size); #尽力消除冲突,但仍然失败...... TA和AT在size = 5时得到索引2
index =(index1 +(itr * index2))%size;

// if itr>设置阈值,退出
itr ++;
if(itr> 200){
index = -1;
休息;
}

//一旦找到索引,退出循环
} while(index> 0& this.keys [index]!= null&&! this.keys [指数] .equals(键));

回报指数;
}

主要部分是之后的前3行。我可以说如果我使用Double Hashing,它应该消除碰撞的可能性? size 是我的哈希表的唯一键的总可能值

解决方案

所以我在这里看到两件事:


  1. 使用两种不同的散列并将它们结合起来以获得更多分布式散列

  2. 如果一个哈希失败,尝试一个新的点,沿着

进行一点点乍一看,它看起来像两个这些是减少散列冲突的好方法。然而,仔细观察,这两者都会陷入真正的算法问题。

结合两个哈希值

哈希算法被设计成在整个频谱上分布相当好。就像如何将两个随机数加在一起不会给你更多随机数一样,将两个哈希加在一起并不能让你获得更多的分布。事实上,将两个理想的发行版加在一起总会让你的东西分布不均匀。因此,使用相同的底层算法的任何一种双重哈希策略,都比单一的哈希策略差。



尝试新点 />
尝试一种算法,如果第一次发生碰撞时会尝试新的哈希值,这很诱人。但是,这会导致算法的检索部分出现问题。当你把东西放在哈希中时,它会碰到另一个地方。然后当你去检索价值时,它不在那里。更糟糕的是,你是否能够找到它取决于第一个元素是否仍然存在。如果它被删除了,那么就不可能知道你要找的物品是否还在,或者它不在那里。最终,.contains测试必须经过200次迭代才能确定它所寻找的散列不存在。



最好的解决方案是使用Java提供的开箱即用哈希。如果碰到很多碰撞,最好在散列中使用较低的加载因子。这增加了桶的数量,并导致碰撞的可能性较小。


I am trying to use Double Hashing to hash a String key into a hash table. I did something like:

protected int getIndex(String key) {
  int itr = 0,
      size = this.values.length,
      index1,
      index2,
      index = 0;

  do {
    // do double hashing to get index for curr [itr] (iteration)
    index1 = Math.abs(key.hashCode()) % size;
    index2 = size - ((key + key + "#!@").hashCode() % size); # trying very hard to eliminate clash, but still fails ... TA and AT gets index 2 when size = 5
    index = (index1 + (itr * index2)) % size;

    // if itr > set threshold, exit
    itr++;
    if (itr > 200) {
      index = -1;
      break;
    }

    // once index found, exit loop
  } while (index > 0 && this.keys[index] != null && !this.keys[index].equals(key));

  return index;
}

Main part is the 1st 3 lines after the do. Can I say if I use Double Hashing, it should eliminate the probability of collision? size is total possible values of unique keys for my hash table

解决方案

So I see two things going on here

  1. Using two different hashes and combining them in attempt to get a more distributed hash
  2. If a hash fails, trying a new spot a little farther along

At first blush, it looks like both of these are a good way to reduce hash collisions. However, on closer inspection, both of these fall into real algorithmic trouble.

Combining two hashes
Hashing algorithms are designed to be fairly well distributed across the integer spectrum. Just like how adding two random numbers together doesn't give you anything more randomer, adding two hashes together doesn't get you something more distributed. In fact, adding two idential distributions together will ALWAYS give you something less evenly distributed. Thus any kind of double hashing strategy, using the same underlying algorithm, is worse than a single hashing strategy.

Trying a new spot
It's tempting to try an algorithm that tries a new hash if the first one collides. However, this causes problems with the retrieval part of the algorithm. When you put something in the hash, and it bumps to the another spot. Then when you go to retrieve the value, it's not there. Worse yet, whether or not you even find it depends on if the first element is still there or not. If it's been removed, then it's impossible to tell if item you're looking for is further along, or if it just isn't there. Ultimately a .contains test would have to go through all 200 iterations before it could be sure that the hash it's looking for isn't there.

The best solution is to use the out of the box hash provided by Java. If you're getting a lot of collisions, the best thing to use a lower load factor in the hash. This increases the number of buckets, and causes collisions to be less likely.

这篇关于散列用于散列表的字符串(Double Hashing)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆