HashSet如何提供恒定的时间添加操作? [英] How can a HashSet offer constant time add operation?

查看:135
本文介绍了HashSet如何提供恒定的时间添加操作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


这个类为基本操作提供了恒定的时间表现(添加,删除,包含和大小)

这让我非常困惑,因为我不明白如何可能得到恒定的时间,O(1),性能比较操作。这是我的想法:

如果这是真的,那么无论我将多少数据存入我的HashSet,我都可以在任何时间访问任何元素。也就是说,如果我在我的HashSet中放置1个元素,它将花费相同的时间来发现它,就好像我有一个元素googolplex。



然而,这如果我具有恒定数量的桶或一致的散列函数,那么这将是不可能的,因为对于任何固定数量的桶,该桶中元素的数量将会线性增长(尽管缓慢,如果数量足够大)然后,这个工作的唯一方法是每次插入一个元素(或每隔几次)就有一个变化的散列函数)。一个简单的哈希函数,从来没有任何碰撞会满足这种需求。一个用于字符串的玩具示例可能是:获取字符串的ASCII值并将它们连接在一起(因为添加可能导致冲突)。

然而,这个散列函数,而且这种类型的其他散列函数可能会因为足够大的字符串或数字等原因而失败。您可以形成的存储区数量会立即受限于您拥有的堆栈/堆空间量等。因此,跳过内存中的位置可以不能无限期地被允许,所以你最终必须填补空白。



但是如果在某个时候重新计算散列函数,这只能是与找到一个通过N个点或O(nlogn)的多项式一样快。



因此,我很困惑。虽然我相信HashSet可以在O(n / B)时间访问元素,其中B是它决定使用的桶的数量,但我没有看到HashSet如何可能在O中执行add或get函数1)时间。



注意:此文章这篇文章都没有解决我列出的问题。

解决方案

,大概是〜 2n ,其中 n 是集合中元素的数量。



请注意, HashSet 可让摊销 O(1)的平均时间表现,而不是最坏的情况。这意味着,我们可能经常遇到 O(n)操作。

所以,当垃圾箱过于封装时,我们只需创建一个新的,更大的阵列,然后将元素复制到它。

这会花费 n 操作,并且在集合中的元素数量超过 2n / 2 = n ,所以这意味着这个操作的平均成本是以 n / n = 1 为常量的。



另外,HashMap提供的冲突数量也是常数



假设您要添加一个元素 X 。用一个元素填充 h(x)的概率是〜 n / 2n = 1/2 。它被3个元素填满的概率是〜(n / 2n)^ 2 = 1/4 (对于大于 n )等等。

这给你的平均运行时间 1 + 1/2 + 1/4 + 1/8 + ... 。由于这个总和收敛到 2 ,这意味着这个操作平均需要一段时间


I was reading the javadocs on HashSet when I came across the interesting statement:

This class offers constant time performance for the basic operations (add, remove, contains and size)

This confuses me greatly, as I don't understand how one could possibly get constant time, O(1), performance for a comparison operation. Here are my thoughts:

If this is true, then no matter how much data I'm dumping into my HashSet, I will be able to access any element in constant time. That is, if I put 1 element in my HashSet, it will take the same amount of time to find it as if I had a googolplex of elements.

However, this wouldn't be possible if I had a constant number of buckets, or a consistent hash function, since for any fixed number of buckets, the number of elements in that bucket will grow linearly (albeit slowly, if the number is big enough) with the number of elements in the set.

Then, the only way for this to work is to have a changing hash function every time you insert an element (or every few times). A simple hash function that never any collisions would satisfy this need. One toy example for strings could be: Take the ASCII value of the strings and concatenate them together (because adding could result in a conflict).

However, this hash function, and any other hash function of this sort will likely fail for large enough strings or numbers etc. The number of buckets that you can form is immediately limited by the amount of stack/heap space you have, etc. Thus, skipping locations in memory can't be allowed indefinitely, so you'll eventually have to fill in the gaps.

But if at some point there's a recalculation of the hash function, this can only be as fast as finding a polynomial which passes through N points, or O(nlogn).

Thus arrives my confusion. While I will believe that the HashSet can access elements in O(n/B) time, where B is the number of buckets it has decided to use, I don't see how a HashSet could possibly perform add or get functions in O(1) time.

Note: This post and this post both don't address the concerns I listed..

解决方案

The number of buckets is dynamic, and is approximately ~2n, where n is the number of elements in the set.

Note that HashSet gives amortized and average time performance of O(1), not worst case. This means, we can suffer an O(n) operation from time to time.
So, when the bins are too packed up, we just create a new, bigger array, and copy the elements to it.
This costs n operations, and is done when number of elements in the set exceeds 2n/2=n, so it means, the average cost of this operation is bounded by n/n=1, which is a constant.

Additionally, the number of collisions a HashMap offers is also constant on average.

Assume you are adding an element x. The probability of h(x) to be filled up with one element is ~n/2n = 1/2. The probability of it being filled up with 3 elements, is ~(n/2n)^2 = 1/4 (for large values of n), and so on and so on.
This gives you an average running time of 1 + 1/2 + 1/4 + 1/8 + .... Since this sum converges to 2, it means this operation takes constant time on average.

这篇关于HashSet如何提供恒定的时间添加操作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆