使用不相等的哈希线性探测巨大的键序列 [英] Linear probing huge sequences of keys with unequal hash

查看:112
本文介绍了使用不相等的哈希线性探测巨大的键序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

关于线性探测(哈希表),有一件事对我来说并不直观. 如果我将散列结果的key1放入数组索引1,然后将key2->数组索引2放入.然后将key3->再放入数组索引1,这将转到数组索引3. 然后,当我搜索key3时,我应该遍历索引,这些索引包含与我的散列根本不相同的散列.这不是浪费吗?如果序列确实很大并且包含许多键(例如,我有20个元素,则为null,对于任何导致数组索引从0到20的键,我都必须遍历所有索引,尽管它们与我的索引没有相同的哈希值)我可以通过单独的链接消除这种情况.

或者我们的哈希函数(如果写得足够好)会在索引之间平均分配键,并不断将数组的大小调整为最大半满,这可以缓解这种情况吗?

解决方案

当存在许多冲突时,线性探测次优.请注意,冲突次数不仅取决于哈希值,而且还取决于表中的插槽数(通常是质数),因为索引是哈希值除以表长度的整数除法的余数./p>

但是请注意,除了将碰撞键彼此相邻之外,还可能会利用CPU缓存,这将使RAM中的许多元素一次读取.因此,(原则上)不要认为检查20个探针所花费的时间是检查一个探针所花费的时间的20倍,因为CPU内部及其缓存中发生的事情比进入RAM快得多.虽然没有魔术.如果每次比较的计算都丢弃了缓存中的内容,那么部分节省的资源将丢失.

There is one thing about linear probing (hash tables) that is not intuitive to me. If I put key1 which hash results to array index 1. Then I put key2 -> array index 2. Then I put key3 -> again array index 1, this will go to array index 3. Then when I search for key3 I should go through indexes that contain keys that do not have the same hash as mine at all. Isn't this waste? If the sequence is really big and contains many keys (for example I have 20 elements, then null, for any key that results in array index from 0 to 20 I must go through all the indexes although they do not have the same hash as mine and I can eliminate this with separate chaining).

Or this is mitigated by the fact that our hashing function (if written well enough) distributes the keys equally among the indices and we constantly resize the array to be max half full?

解决方案

Linear probing is sub-optimal when there are many collisions. Note that the number of collisions doesn't only depend on the hash, but also on the number of slots in the table (usually a prime number) because the index is the remainder of the integer division of the hash by the table length.

Note however, than having the colliding keys one next to the other might also take advantage of the CPU caches, which will bring from RAM many elements in one read. So, do not (in principle) think that the time it takes to check 20 probes is 20 times the time it takes to check one, because what happens inside the CPU and its caches is much faster than going to RAM. There is no magic though. If the computation of every comparison throws away what's in the caches, then part of the savings will be lost.

这篇关于使用不相等的哈希线性探测巨大的键序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆