哈希表-使用二进制搜索树实现 [英] Hash table - implementing with Binary Search Tree

查看:86
本文介绍了哈希表-使用二进制搜索树实现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

来自破解编码面试,第71页:

或者,我们可以使用BST实现哈希表.然后我们可以 保证O(log n)查找时间,因为我们可以保留树 均衡.另外,由于数组大,所以我们可以使用更少的空间 从一开始就需要分配更长的时间.

Alternatively, we can implement hash table with a BST. We can then guarantee an O(log n) lookup time, since we can keep the tree balanced. Additionally we may use less space, since a large array no longer needs to be allocated in the very beginning.

我知道链接列表,哈希表和BST的基础知识,但是我无法理解这些行.这到底是什么意思?最终的数据结构将是Trie吗?

I know the basics of linked lists, hash tables and BSTs, but I am unable to understand these lines. What does it actually mean? Would this final data structure would be a Trie?

推荐答案

该部分的完整文本指出,最后一段是您所询问的:

The full text of that section states, with the last paragraph being the one you asked about:

哈希表是一种将键映射到值以进行高效查找的数据结构.在一个 哈希表的非常简单的实现,该哈希表具有一个基础数组和 哈希函数.当您要插入对象及其键时,哈希函数会映射 整数的键,它指示数组中的索引.然后将该对象存储在 该索引.

A hash table is a data structure that maps keys to values for highly efficient lookup. In a very simple implementation of a hash table, the hash table has an underlying array and a hash function. When you want to insert an object and its key, the hash function maps the key to an integer, which indicates the index in the array. The object is then stored at that index.

不过,通常情况下,这不太可行.在上面的实现中,哈希 所有可能键的值必须唯一,否则我们可能会意外覆盖数据.这 数组必须非常大-所有可能的键的大小-才能防止这种情况 碰撞".

Typically, though, this won't quite work right. In the above implementation, the hash value of all possible keys must be unique, or we might accidentally overwrite data. The array would have to be extremely large—the size of all possible keys—to prevent such "collisions."

我们没有制作一个非常大的数组并将对象存储在索引哈希(键)上,而是 可以使数组更小,并将对象存储在索引哈希(键)%的链表中 要获取具有特定键的对象,必须在链接列表中搜索以下内容 这个键.

Instead of making an extremely large array and storing objects at index hash (key), we can make the array much smaller and store objects in a linked list at index hash (key) % array_length.To get the object with a particular key, we must search the linked list for this key.

或者,我们可以用二进制搜索树实现哈希表.然后我们可以 保证0(log n)查找时间,因为我们可以使树保持平衡.此外, 我们可以使用更少的空间,因为不再需要在 开始.

Alternatively, we can implement the hash table with a binary search tree. We can then guarantee an 0(log n) lookup time, since we can keep the tree balanced. Additionally, we may use less space, since a large array no longer needs to be allocated in the very beginning.

所以他们正在谈论使用BST(二进制搜索树)来处理冲突.将BST用作唯一数据实际上没有任何意义.因为适当调整的散列的全部要点是查找的顺序是O(1),所以 比来自BST的O(log n)好.最重要的是,使用BST完全实现哈希表 意味着它不是实际上哈希表:-)

So they're talking about using a BST (binary search tree) to handle collisions. It wouldn't actually make sense to use a BST as the sole data structure since the whole point of a properly tuned hash is that look-up is on the order of O(1), much better than the O(log n) from a BST. On top of that, using a BST to totally implement a hash table means it's not actually a hash table :-)

但是,请考虑一下,当哈希表中有冲突时,一种常见的处理方法是让每个存储桶都包含其项的链接列表.在简并的情况下(所有项目都散列到同一存储桶中),您最终只会得到一个链表,而O(1)变成O(n).

However, consider that, when you have collisions in a hash table, a frequent way to handle them is to have each bucket contain a linked list of its items. In the degenerate case (all items hashing to the same bucket), you end up with just a linked list and the O(1) turns into O(n).

因此,您有一个BST,而不是每个存储段中的链表.这样,如果单个存储桶中有很多项目(前面提到的冲突),您就不再具有O(n)搜索复杂性.

So, rather than a linked list at each bucket, you have a BST. Then you no longer have O(n) search complexity in cases where a single bucket has many items (the previously mentioned collisions).

您可以使用哈希函数在O(1)中找到存储区,然后在O(log n)中通过BST搜索是否存在冲突.在最佳情况下(每个存储桶一项),它仍然是O(1).然后,最坏的情况变为O(log n)而不是O(n).

You use the hash function to find the bucket in O(1) then search through the BST in O(log n) if there are collisions. In the best case (one item per bucket), it's still O(1). The worst case then becomes O(log n) rather than O(n).

最初使我担心该理论的唯一一件事是,他们还讨论了不再需要大量分配的事实.如果是共享的哈希/BST组合,则仍然需要分配整个哈希表,以免显得不协调.

The only thing that originally concerned me about that theory is that they also discuss the fact that a large allocation is no longer necessary. If it's a shared hash/BST combination, you still need to allocate the entire hash table so that seemed incongruous.

但是,从上下文("...因为不再需要分配 large 数组..."),看来,它们意味着它们可以由于冲突更有效地处理,因此使哈希表在双重数据结构中的组成部分更小.换句话说,与其使用链接的冲突列表而不是包含1000个元素的哈希表,还可以使用包含100个元素的哈希表,因为如果使用BST,冲突不会对搜索时间造成太大的破坏.

However, from the context ("... since a large array no longer needs to be allocated ..."), it appears that they mean they can make the hash table part of the dual data structure smaller as the collisions are more efficient to process. In other words, rather than a 1000-element hash table with linked lists for collisions, you can get away with a 100-element hash table because the collisions are not so damaging to the search time if you use a BST.

这篇关于哈希表-使用二进制搜索树实现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆