为什么哈希表扩张通常是由规模扩大一倍呢? [英] Why are hash table expansions usually done by doubling the size?

查看:524
本文介绍了为什么哈希表扩张通常是由规模扩大一倍呢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我做的哈希表有点研究,我一直在整个的经验法则运行,当有一定数量的作品(无论是最高还是通过负载因子样75%)的哈希表应扩大

I've done a little research on hash tables, and I keep running across the rule of thumb that when there are a certain number of entries (either max or via a load factor like 75%) the hash table should be expanded.

几乎总是推荐是加倍(或双加1,即,2n + 1个)的哈希表的大小。不过,我一直没能找到一个很好的理由。

Almost always, the recommendation is to double (or double plus 1, i.e., 2n+1) the size of the hash table. However, I haven't been able to find a good reason for this.

为什么双重的大小,而比,也就是说,增加它的25%,或者它提高到下一个质数的大小,或下面k的素数(例如,三)?

Why double the size, rather than, say, increasing it 25%, or increasing it to the size of the next prime number, or next k prime numbers (e.g., three)?

我已经知道,它往往是一个好主意,选择哪是一个素数,至少如果你的散列函数使用模如通用散列初始哈希表的大小。我知道,这就是为什么它通常会建议做2N + 1,而不是2N(例如,<一个href="http://www.concentric.net/~Ttwang/tech/hashsize.htm">http://www.concentric.net/~Ttwang/tech/hashsize.htm)

I already know that it's often a good idea to choose an initial hash table size which is a prime number, at least if your hash function uses modulus such as universal hashing. And I know that's why it's usually recommended to do 2n+1 instead of 2n (e.g., http://www.concentric.net/~Ttwang/tech/hashsize.htm)

但正如我所说,我没有看到为什么一倍或一倍加一的任何真正的解释其实是一个不错的选择,而不是选择一个大小为新的哈希表的其他方法。

However as I said, I haven't seen any real explanation for why doubling or doubling-plus-one is actually a good choice rather than some other method of choosing a size for the new hash table.

(是的,我读过关于哈希表维基百科的文章:) http://en.wikipedia.org/wiki/ Hash_table

(And yes I've read the Wikipedia article on hash tables :) http://en.wikipedia.org/wiki/Hash_table

推荐答案

哈希表不能要求分期常量时间插入的话,比如,调整大小是一个常数增量。在这种情况下,调整的成本(其生长与哈希表的大小)将使一个插入线性的成本中的元素,以插入的总数。因为调整大小变得越来越与表的大小较昂贵,它有发生少经常来保持恒定的插入的摊销成本

Hash-tables could not claim "amortized constant time insertion" if, for instance, the resizing was by a constant increment. In that case the cost of resizing (which grows with the size of the hash-table) would make the cost of one insertion linear in the total number of elements to insert. Because resizing becomes more and more expensive with the size of the table, it has to happen "less and less often" to keep the amortized cost of insertion constant.

大多数系统允许的平均斗占领调整(0.5至3的任何地方,这些都是可以接受的值),然后成长,直到绑定固定的提前。有了这个约定,只是调整的平均斗占领之后变成一半约束。通过加倍调整保持平均斗占领宽度的带* 2。

Most implementations allow the average bucket occupation to grow to until a bound fixed in advance before resizing (anywhere between 0.5 and 3, which are all acceptable values). With this convention, just after resizing the average bucket occupation becomes half that bound. Resizing by doubling keeps the average bucket occupation in a band of width *2.

分注:统计聚类,因为,你必须采取的平均桶的职业,如果你想多桶有至多一个元素(最高速度查找忽略缓存大小的复杂的效果)低至0.5,或高达3,如果你想空水桶(对应于浪费的空间)的最小数目。

Sub-note: because of statistical clustering, you have to take an average bucket occupation as low as 0.5 if you want many buckets to have at most one elements (maximum speed for finding ignoring the complex effects of cache size), or as high as 3 if you want a minimum number of empty buckets (that correspond to wasted space).

这篇关于为什么哈希表扩张通常是由规模扩大一倍呢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆