为什么将 HashTable 的长度设置为质数是一个好习惯? [英] Why setting HashTable's length to a Prime Number is a good practice?

查看:30
本文介绍了为什么将 HashTable 的长度设置为质数是一个好习惯?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在阅读 Eric Lippert 的最新博客文章,以了解 指南和 GetHashCode 的规则,当我点击这个段落时:

I was going through Eric Lippert's latest Blog post for Guidelines and rules for GetHashCode when i hit this para:

我们可以在这里更聪明;就像 List 在它变满时调整自己的大小一样,bucket set 也可以调整自己的大小,以确保平均 bucket 长度保持较低.此外,出于技术原因,将桶集长度设为质数通常是一个好主意,而不是 100.我们可以对这个哈希表进行很多改进.但是这个哈希表的简单实现的快速草图现在就可以了.我想保持简单.

We could be even more clever here; just as a List resizes itself when it gets full, the bucket set could resize itself as well, to ensure that the average bucket length stays low. Also, for technical reasons it is often a good idea to make the bucket set length a prime number, rather than 100. There are plenty of improvements we could make to this hash table. But this quick sketch of a naive implementation of a hash table will do for now. I want to keep it simple.

所以看起来我错过了一些东西.为什么将其设置为质数是一个好习惯?

So looks like i'm missing something. Why is it a good practice to set it to a prime number?.

推荐答案

假设您的桶集长度是 2 的幂 - 这使得 mod 计算非常快.这也意味着桶的选择仅由哈希码的前 m 位决定.(其中 m = 32 - n,其中 n 是所使用的 2 的幂).所以这就像你立即扔掉了哈希码的有用部分.

Say your bucket set length is a power of 2 - that makes the mod calculations quite fast. It also means that the bucket selection is determine solely by the top m bits of the hash code. (Where m = 32 - n, where n is the power of 2 being used). So it's like you're throwing away useful bits of the hashcode immediately.

或者如这篇博文 从 2006 年开始:

Or as in this blog post from 2006 puts it:

假设您的 hashCode 函数产生以下 hashCodes {x , 2x, 3x, 4x, 5x, 6x...},那么所有这些都将聚集在 m 个桶中,其中 m = table_length/GreatestCommonFactor(table_length, x).(验证/推导这一点很简单).现在,您可以执行以下操作之一来避免聚类:

Suppose your hashCode function results in the following hashCodes among others {x , 2x, 3x, 4x, 5x, 6x...}, then all these are going to be clustered in just m number of buckets, where m = table_length/GreatestCommonFactor(table_length, x). (It is trivial to verify/derive this). Now you can do one of the following to avoid clustering:

...

或者简单地通过使 GreatestCommonFactor(table_length, x) 等于 1 来使 m 等于 table_length,即通过使 table_length 与 x 互质.如果 x 几乎可以是任何数字,那么请确保 table_length 是素数.

Or simply make m equal to the table_length by making GreatestCommonFactor(table_length, x) equal to 1, i.e by making table_length coprime with x. And if x can be just about any number then make sure that table_length is a prime number.

这篇关于为什么将 HashTable 的长度设置为质数是一个好习惯?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆