链接哈希表与开放式哈希表 [英] Chained Hash Tables vs. Open-Addressed Hash Tables

查看:311
本文介绍了链接哈希表与开放式哈希表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



对于一个库,建议使用什么实现?

解决方案

维基百科的散列表文章给出了一个明显更好的解释和概述不同的哈希表方案,人们已经使用比我能够脱离我的头顶。实际上,你可能比阅读这篇文章更好,而不是问问题。 :)



表示...



链接的哈希表索引到头的指针数组的链表。每个链接列表单元格具有分配给它的键和为该键插入的值。当您要从其键中查找特定元素时,键的哈希值用于计算要跟踪的链接列表,然后遍历该特定列表以查找您之后的元素。如果散列表中的多个键具有相同的哈希值,那么您将链接到多个元素。



链接散列的缺点是必须按照指针搜索链接列表。上行空间是,链接的哈希表只会随着负载因子(哈希表中的元素与bucket数组的长度的比例)增加而线性变慢,即使它高于1。



一个开放寻址哈希表将索引到一个指向数组(key,value)的数组中。您可以使用键的哈希值来计算阵列中的哪个槽。如果哈希表中有多个密钥具有相同的哈希值,那么您可以使用一些方案来决定另一个插槽。例如,线性探测是您选择之后的下一个插槽的位置,然后是之后的下一个插槽,依此类推,直到找到与您要查找的键匹配的插槽,或者您点击空插槽(在这种情况下,键不能在那里)。



当负载因子低时,开放寻址通常比链接散列快,因为您不必遵循列表节点之间的指针。如果负载因子接近1,它会非常慢,因为在找到要查找的密钥或空插槽之前,通常必须搜索bucket数组中的许多插槽。另外,哈希表中不能有更多的元素,而不是桶数组中的条目。



为了处理所有哈希表至少变慢的事实并且在某些情况下实际上完全断开)当它们的负载因子接近1时,实际的哈希表实现使得桶阵列更大(通过分配新的桶阵列,并将元素从旧的数据复制到新的,然后释放旧的)当负载因子超过一定值(通常约为0.7)时。



以上都有很多变体。再次,请参阅维基百科的文章,这真的很好。



对于一个旨在被其他人使用的图书馆,我会强烈地推荐实验。由于它们的性能至关重要,因此通常最好使用其他人的哈希表的实现,这已经被仔细调整了。有很多开源的BSD,LGPL和GPL许可的哈希表实现。



例如,如果你正在使用GTK,那么你会发现有一个很好的 GLib中的哈希表


Can somebody explain the main differences between (advantages / disadvantages) the two implementations?

For a library, what implementation is recommended?

解决方案

Wikipedia's article on hash tables gives a distinctly better explanation and overview of different hash table schemes that people have used than I'm able to off the top of my head. In fact you're probably better off reading that article than asking the question here. :)

That said...

A chained hash table indexes into an array of pointers to the heads of linked lists. Each linked list cell has the key for which it was allocated and the value which was inserted for that key. When you want to look up a particular element from its key, the key's hash is used to work out which linked list to follow, and then that particular list is traversed to find the element that you're after. If more than one key in the hash table has the same hash, then you'll have linked lists with more than one element.

The downside of chained hashing is having to follow pointers in order to search linked lists. The upside is that chained hash tables only get linearly slower as the load factor (the ratio of elements in the hash table to the length of the bucket array) increases, even if it rises above 1.

An open-addressing hash table indexes into an array of pointers to pairs of (key, value). You use the key's hash value to work out which slot in the array to look at first. If more than one key in the hash table has the same hash, then you use some scheme to decide on another slot to look in instead. For example, linear probing is where you look at the next slot after the one chosen, and then the next slot after that, and so on until you either find a slot that matches the key you're looking for, or you hit an empty slot (in which case the key must not be there).

Open-addressing is usually faster than chained hashing when the load factor is low because you don't have to follow pointers between list nodes. It gets very, very slow if the load factor approaches 1, because you end up usually having to search through many of the slots in the bucket array before you find either the key that you were looking for or an empty slot. Also, you can never have more elements in the hash table than there are entries in the bucket array.

To deal with the fact that all hash tables at least get slower (and in some cases actually break completely) when their load factor approaches 1, practical hash table implementations make the bucket array larger (by allocating a new bucket array, and copying elements from the old one into the new one, then freeing the old one) when the load factor gets above a certain value (typically about 0.7).

There are lots of variations on all of the above. Again, please see the wikipedia article, it really is quite good.

For a library that is meant to be used by other people, I would strongly recommend experimenting. Since they're generally quite performance-crucial, you're usually best off using somebody else's implementation of a hash table which has already been carefully tuned. There are lots of open-source BSD, LGPL and GPL licensed hash table implementations.

If you're working with GTK, for example, then you'll find that there's a good hash table in GLib.

这篇关于链接哈希表与开放式哈希表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆