为什么用二进制搜索树实现哈希表? [英] Why implement a Hashtable with a Binary Search Tree?

查看:104
本文介绍了为什么用二进制搜索树实现哈希表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用数组实现哈希表时,我们继承了数组的恒定时间索引.用二进制搜索树实现哈希表的原因是什么,因为它提供了带有O(logn)的搜索?为什么不直接使用二进制搜索树呢?

When implementing a Hashtable using an array, we inherit the constant time indexing of the array. What are the reasons for implementing a Hashtable with a Binary Search Tree since it offers search with O(logn)? Why not just use a Binary Search Tree directly?

推荐答案

如果元素没有总顺序(即未为所有对定义大于"和小于",或者元素之间不一致),您无法比较所有对,因此无法使用BST直接,但是没有什么可以阻止您通过哈希值为BST编制索引-由于这是整数值,因此显然具有总顺序(尽管您仍然需要

If the elements don't have a total order (i.e. the "greater than" and "less than" is not be defined for all pairs or it is not consistent between elements), you can't compare all pairs, thus you can't use a BST directly, but nothing's stopping you from indexing the BST by the hash value - since this is an integral value, it obviously has a total order (although you'd still need to resolve collision, that is have a way to handle elements with the same hash value).

但是,与哈希表相比,BST的最大优点之一是元素是有序的-如果我们按哈希值对其进行排序,则这些元素将具有任意顺序 ,并且该优势将不再适用.

However, one of the biggest advantages of a BST over a hash table is the fact that the elements are in order - if we order it by hash value, the elements will have an arbitrary order instead, and this advantage would no longer be applicable.

关于为什么可能考虑使用BST而不是数组实现哈希表的原因,

As for why one might consider implementing a hash table using a BST instead of an array, it would:

  • 没有需要调整数组大小的缺点-对于数组,通常使用数组大小​​修改哈希值,并在数组变满时重新调整数组大小,然后重新插入所有元素,但使用BST,您只需将不变的哈希值直接插入BST中即可.

  • Not have the disadvantage of needing to resize the array - with an array, you typically mod the hash value with the array size and resize the array if it gets full, reinserting all elements, but with a BST, you can just directly insert the unchanging hash value into the BST.

如果我们希望任何单个操作都不要花费超过一定的时间(如果我们需要调整阵列的大小,这很可能会发生),而整体性能是次要的,那么这可能是有意义的,但是可能会有更好的选择解决此问题的方法.

This might be relevant if we want any individual operation to never take more than a certain amount of time (which could very well happen if we need to resize the array), with the overall performance being secondary, but there might be better ways to solve this problem.

散列冲突的风险降低了,因为您无需修改​​数组大小,因此可能的散列数量可能会大大增加.这样可以降低获得哈希表最坏情况性能的风险(在这种情况下,大部分元素哈希值都相同).

Have a reduced risk of hash collisions since you don't mod with the array size and thus the number of possible hashes could be significantly bigger. This would reduce the risk of getting the worst-case performance of a hash table (which is when a significant portion of the elements hash to the same value).

实际的最坏情况下的性能取决于您如何解决冲突.通常使用链表来完成O(n)最坏情况的性能.但是我们也可以使用BST实现O(log n)性能(如

What the actual worst-case performance is would depend on how you're resolving collisions. This is typically done with linked-lists for O(n) worst case performance. But we can also achieve O(log n) performance with BST's (as is done in Java's hash table implementation if the number of elements with some hash are above a threshold) - that is, have your hash table array where each element points to a BST where all elements have the same hash value.

可能使用较少的内存-对于数组,您不可避免地会有一些空索引,但是对于BST,这些索引根本就不需要存在.尽管这不是一个明显的优势,但如果根本不是一个优势.

Possibly use less memory - with an array you'd inevitably have some empty indices, but with a BST, these simply won't need to exist. Although this is not a clear-cut advantage, if it's an advantage at all.

如果我们假设使用不太常见的基于数组的BST实现,数组还将有一些空索引,这也需要偶尔调整大小,但这只是简单的内存副本,而不是需要使用更新的哈希值重新插入所有元素.

If we assume we use the less common array-based BST implementation, this array will also have some empty indices and this would also require the occasional resizing, but this is a simply memory copy as opposed to needing to reinsert all elements with updated hashes.

如果我们使用典型的基于指针的BST实现,则指针的额外开销似乎将超过在数组中具有一些空索引的开销(除非数组特别稀疏,这对于哈希表).

If we use the typical pointer-based BST implementation, the added cost for the pointers would seemingly outweigh the cost of having a few empty indices in an array (unless the array is particularly sparse, which tends to be a bad sign for a hash table anyway).

但是,由于我个人还没有听说过这样做,因此,从预期的O(1)到O(log n)的运行成本增加所带来的好处可能不值得.

But, since I haven't personally ever heard of this ever being done, presumably the benefits are not worth the increased cost of operations from expected O(1) to O(log n).

通常,选择实际上是在直接使用BST(无哈希值)和使用哈希表(具有数组)之间.

Typically the choice is indeed between using a BST directly (without hash values) and using a hash table (with an array).

这篇关于为什么用二进制搜索树实现哈希表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆