从线性探测到二次探测(哈希冲突) [英] Moving from Linear Probing to Quadratic Probing (hash collisons)

查看:717
本文介绍了从线性探测到二次探测(哈希冲突)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我当前对哈希表的实现方式是使用线性探测,现在我想转向二次探测(后来又要进行链接,也可能要进行两次哈希运算).我读过一些文章,教程,维基百科等.但是我仍然不知道该怎么做.

My current implementation of an Hash Table is using Linear Probing and now I want to move to Quadratic Probing (and later to chaining and maybe double hashing too). I've read a few articles, tutorials, wikipedia, etc... But I still don't know exactly what I should do.

基本上,线性探测的步长为1,这很容易做到.在哈希表中搜索,插入或删除元素时,我需要计算一个哈希,为此,我这样做:

Linear Probing, basically, has a step of 1 and that's easy to do. When searching, inserting or removing an element from the Hash Table, I need to calculate an hash and for that I do this:

index = hash_function(key) % table_size;

然后,在搜索,插入或删除时,我在表中循环浏览,直到找到可用的存储桶,如下所示:

Then, while searching, inserting or removing I loop through the table until I find a free bucket, like this:

do {
    if(/* CHECK IF IT'S THE ELEMENT WE WANT */) {
        // FOUND ELEMENT

        return;
    } else {
        index = (index + 1) % table_size;
    }
while(/* LOOP UNTIL IT'S NECESSARY */);

至于二次探究,我认为我需要做的是改变索引"步长的计算方式,但这就是我不知道应该如何做的事情.我看过各种代码,它们都有点不同.

As for Quadratic Probing, I think what I need to do is change how the "index" step size is calculated but that's what I don't understand how I should do it. I've seen various pieces of code, and all of them are somewhat different.

此外,我还看到了Quadratic Probing的一些实现,在这些实现中,哈希函数已更改为适应该函数(但不是全部).确实需要该更改吗?还是可以避免修改哈希函数,而仍然使用二次探查?

Also, I've seen some implementations of Quadratic Probing where the hash function is changed to accommodated that (but not all of them). Is that change really needed or can I avoid modifying the hash function and still use Quadratic Probing?

阅读下面Eli Bendersky指出的所有内容之后,我认为我有了大致的了解.这是 http://eternallyconfuzzled.com/tuts/datastructures/jsw_tut_hashtable.aspx :

After reading everything pointed out by Eli Bendersky below I think I got the general idea. Here's part of the code at http://eternallyconfuzzled.com/tuts/datastructures/jsw_tut_hashtable.aspx:

15   for ( step = 1; table->table[h] != EMPTY; step++ ) {
16     if ( compare ( key, table->table[h] ) == 0 )
17       return 1;
18 
19     /* Move forward by quadratically, wrap if necessary */
20     h = ( h + ( step * step - step ) / 2 ) % table->size;
21   }

我没有两件事...他们说二次探测通常是用c(i)=i^2完成的.但是,在上面的代码中,它的作用类似于c(i)=(i^2-i)/2

There's 2 things I don't get... They say that quadratic probing is usually done using c(i)=i^2. However, in the code above, it's doing something more like c(i)=(i^2-i)/2

我已经准备好在我的代码中实现此功能,但我会简单地这样做:

I was ready to implement this on my code but I would simply do:

index = (index + (index^index)) % table_size;

...而不是:

index = (index + (index^index - index)/2) % table_size;

如果有的话,我会做的:

If anything, I would do:

index = (index + (index^index)/2) % table_size;

...因为我看过其他代码示例两次.虽然我不明白为什么...

...cause I've seen other code examples diving by two. Although I don't understand why...

1)为什么要减去步长?
2)为什么要按2潜水?

1) Why is it subtracting the step?
2) Why is it diving it by 2?

推荐答案

您无需修改​​哈希函数即可进行二次探测.二次探测的最简单形式实际上是在结果位置加上结果平方,而不是线性1、2、3.

You don't have to modify the hash function for quadratic probing. The simplest form of quadratic probing is really just adding consequent squares to the calculated position instead of linear 1, 2, 3.

此处,有很好的资源.以下是从那里得到的.当使用简单多项式c(i) = i^2时,这是二次探测的最简单形式:

There's a good resource here. The following is taken from there. This is the simplest form of quadratic probing when the simple polynomial c(i) = i^2 is used:

在更一般的情况下,公式为:

In the more general case the formula is:

您可以选择常量.

但是请记住,二次探测仅在某些情况下才有用.正如维基百科条目所述:

Keep, in mind, however, that quadratic probing is useful only in certain cases. As the Wikipedia entry states:

二次探查可提供良好的记忆力 缓存,因为它保留了一些 参考地点;但是,线性 探测具有更大的局部性,并且 因此,更好的缓存性能. 二次探测更好地避免了 可能发生的群集问题 线性探测,尽管不是 免疫.

Quadratic probing provides good memory caching because it preserves some locality of reference; however, linear probing has greater locality and, thus, better cache performance. Quadratic probing better avoids the clustering problem that can occur with linear probing, although it is not immune.


与计算机科学中的许多事物一样,二次探测的确切常数和多项式都是启发式的.是的,最简单的形式是i^2,但是您可以选择任何其他多项式. Wikipedia用h(k,i) = (h(k) + i + i^2)(mod m)给出了示例.


Like many things in computer science, the exact constants and polynomials of quadratic probing are heuristic. Yes, the simplest form is i^2, but you may choose any other polynomial. Wikipedia gives the example with h(k,i) = (h(k) + i + i^2)(mod m).

因此,很难回答您的为什么"问题.这里唯一的为什么"是为什么您根本需要二次探测?在使用其他形式的探测和获取聚簇表时遇到问题吗?还是仅仅是一项家庭作业或自学?

Therefore, it is difficult to answer your "why" question. The only "why" here is why do you need quadratic probing at all? Having problems with other forms of probing and getting a clustered table? Or is it just a homework assignment, or self-learning?

请记住,到目前为止,哈希表最常见的冲突解决技术是链接或线性探测.二次探测是一种在特殊情况下可用的启发式选项,除非您知道自己做得很好,否则我不建议您使用它.

Keep in mind that by far the most common collision resolution technique for hash tables is either chaining or linear probing. Quadratic probing is a heuristic option available for special cases, and unless you know what you're doing very well, I wouldn't recommend using it.

这篇关于从线性探测到二次探测(哈希冲突)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆