为什么哈希表的长度设置为一个素数是一个好的做法呢? [英] Why setting HashTable's length to a Prime Number is a good practice?

查看:171
本文介绍了为什么哈希表的长度设置为一个素数是一个好的做法呢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在经历埃里克利珀的最新博客文章的准则和规则GetHash code 当我打这个段:

I was going through Eric Lippert's latest Blog post for Guidelines and rules for GetHashCode when i hit this para:

我们可能更聪明在这里;只是作为一个列表自身的大小已满时,水桶集可自行调整大小为好,以确保平均长度桶保持低电平。此外,由于技术原因,它往往是一个好主意,以桶设定长度的黄金数目,而不是100有很多的改进,我们可以对这个哈希表。但是,一个天真的实现一个哈希表的这种速写将为现在要做的。我想保持它的简单。

We could be even more clever here; just as a List resizes itself when it gets full, the bucket set could resize itself as well, to ensure that the average bucket length stays low. Also, for technical reasons it is often a good idea to make the bucket set length a prime number, rather than 100. There are plenty of improvements we could make to this hash table. But this quick sketch of a naive implementation of a hash table will do for now. I want to keep it simple.

所以看起来像我失去了一些东西。为什么它是一个很好的做法,将其设置为一个素数?

So looks like i'm missing something. Why is it a good practice to set it to a prime number?.

推荐答案

假设你的水桶设置长度为2的幂 - 这使得mod的计算速度相当快。这也意味着,斗选择由顶部完全确定 M 散code位。 (这里的 M = 32 - N ,其中n为2的幂正在使用)。因此,这就像你马上扔掉的哈希code有用位。

Say your bucket set length is a power of 2 - that makes the mod calculations quite fast. It also means that the bucket selection is determine solely by the top m bits of the hash code. (Where m = 32 - n, where n is the power of 2 being used). So it's like you're throwing away useful bits of the hashcode immediately.

或在从2006年看跌这个博客帖子它:

假设你的哈希code函数结果如下哈希codeS等等{X,2倍,3倍,4倍,5倍,6倍...},那么所有这些都将在正m要集群桶,其中m = table_length / GreatestCommonFactor的数目(table_length,x)的(这是微不足道的验证/推导出这个)。现在,您可以执行下列操作,以避免集群之一:

Suppose your hashCode function results in the following hashCodes among others {x , 2x, 3x, 4x, 5x, 6x...}, then all these are going to be clustered in just m number of buckets, where m = table_length/GreatestCommonFactor(table_length, x). (It is trivial to verify/derive this). Now you can do one of the following to avoid clustering:

...

或者简单地使M分别使GreatestCommonFactor(table_length,x)的等于1,即通过使table_length互质,其中x等于table_length。如果x可以是几乎所有的号码,然后确保table_length是一个素数。

Or simply make m equal to the table_length by making GreatestCommonFactor(table_length, x) equal to 1, i.e by making table_length coprime with x. And if x can be just about any number then make sure that table_length is a prime number.

这篇关于为什么哈希表的长度设置为一个素数是一个好的做法呢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆