我怎样才能实现一个固定大小的hashmap? [英] How can I implement a fixed size hashmap?

查看:1167
本文介绍了我怎样才能实现一个固定大小的hashmap?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想实现一个hashmap,但我不允许它扩展。因为我知道我最多需要存储 N 元素,所以我可以预先为 N 元素分配一个数组对于我的散列表中的每个桶,以便在所有键在同一个桶中散列的最坏情况下,仍然可以存储 N 元素。但是我需要存储的元素相当大,所以对于大 N 这是非常低效的内存使用。



是否有可能通过固定数量的内存实现一个有效的 (根据内存)的hashmap,例如通过实现智能散列函数?



(PS:密钥是一个无符号的32位整数,除了关键值I将收到的范围内的一个相当小的子集,这个子集在范围内移动速度非常缓慢。)




我现在有一个实现,其中有两个长度为 N 的数组,其中一个具有元素,另一个具有对应于位置处元素的键。我在这两个数组中。我使用模运算作为散列函数来确定元素应该插入/存在的位置,并使用线性探测器来查找发生碰撞时最近的空点。我认为这是O(N)的复杂性,我认为这对于我期望的数据量来说可以合理快速地工作。我问了这个问题,看它是否可以做得更好。

解决方案

对于散列,您可以使用以下代码片段,内核用来对PID进行哈希:

$ $ $ $ $ $ $ $ $ $ $ $ b unsigned long hash = val * 0x9e370001UL;
返回散列>> (32位);

幻数 0x9e370001UL 是一个大素数。这里有一个来自理解Linux Kernel 的解释魔术数字的摘录:

lockquote

你可能想知道0x9e370001常量(= 2,654,404,609)来自
。这个散列函数基于索引乘以
a合适的大数,以便结果溢出并且在32位变量中剩余的值
可以被认为是$ b $的结果b模数运算。 Knuth建议,当大乘数大于黄金比例到
232(32位是80×86寄存器的大小)时,获得好的结果
。现在,
2,654,404,609是附近的素数,也可以通过加法和位移容易地乘以
,因为它等于2 ^ 31 + 2 ^ 29 -
2 ^ 25 + 2 ^ 22 - 2 ^ 19 - 2 ^ 16 + 1。

右移 hash>> (32位); 只是说在哈希值中保留位数。其他位将被清零。在你的情况下, bits 将由限制 N 确定。为使其工作, N 必须是这样的,即它的最高有效位位之后的所有位也被设置,例如,对于 N = 7 (其中最后三位全部置位且所有其他位为零),将为3.或者 N = 63 其中最不重要的六位全部置位,其他所有位均为零。这里 bits 将是6.



hash_long 函数返回的值将形成index into



处理冲突



要处理冲突,只保留一个数组但使其成为链接列表节点的数组。因此数组中的每个元素指向一个链表。当发生冲突时,只需将新条目添加到与数组中该槽相对应的链表的末尾。



处理冲突(更新)



如果你不能动态地分配新的内存,那么你发布的解决方案看起来很好,但我不确定数组只包含键的目的是什么(shouldn一个关键是它所属元素的成员?)。以下是对您的解决方案的建议:

具有一维数组意味着如果发生碰撞,我们在插入和检索时执行线性探测。另一种方法是创建一个2维数组,其中内部数组充当链接列表。我们需要在每个内部数组中插入最后一个元素的索引。与1-D数组相比,下面是如果在同一个索引上发生太多的碰撞,那么我们可能会在内部数组之一中发生空间不足,除非我们使每个长度为N的内部数组都出现,这将导致很多浪费的空间。优点是插入时,我们不需要执行线性探测。我们只需检查内部数组中最后一个元素的索引,然后将其加1以获得插入新元素的下一个插槽。


I want to implement a hashmap, but I am not allowed to let it expand. Since I do know that I need to store at most N elements, I could pre-allocate an array with N elements for each bucket of my hashtable, so that I can still store N elements in the worst case where all keys are hashed on the same bucket. But the elements that I need to store are rather big, so for large N this is very inefficient use of memory.

Is it possible to implement a hashmap efficiently (in terms of memory) with a fixed amount of memory, e.g. by implementing a smart hashing function?

(P.S.: the key is an unsigned 32-bit integer, and I have no prior knowledge about the keys except that the key values I will receive are in a fairly small subset of the range, and this subset moves very slowly up in the range.)


I now have an implementation where I have two arrays of length N, one with the elements, and one with the keys that correspond to the element at position i in both arrays. I use the modulo operation as a hash function to determine where the element should be inserted/present, and a linear probe to find the nearest empty spot in case of a collision. I think this is of complexity O(N), and I think this will work reasonably fast for the amounts of data that I am expecting. I asked the question to see if it can be done better.

解决方案

For hashing, you could use following snippet, which btw Linux kernel uses to hash PIDs:

unsigned long hash_long(unsigned long val, unsigned int bits)
{
unsigned long hash = val * 0x9e370001UL;
return hash >> (32 - bits);
}

The magic number 0x9e370001UL is a large prime number. Here's an extract from Understanding Linux Kernel explaining the magic number:

You might wonder where the 0x9e370001 constant (= 2,654,404,609) comes from. This hash function is based on a multiplication of the index by a suitable large number, so that the result overflows and the value remaining in the 32-bit variable can be considered as the result of a modulus operation. Knuth suggested that good results are obtained when the large multiplier is a prime approximately in golden ratio to 232 (32 bit being the size of the 80×86’s registers). Now, 2,654,404,609 is a prime near to that can also be easily multiplied by additions and bit shifts, because it is equal to 2^31 + 2^29 - 2^25 + 2^22 - 2^19 - 2^16 + 1.

The right shift hash >> (32 - bits); just says keep bits number of bits in the hash value. Other bits will be zeroed out. In your case, bits will be determined by the limit N. For this to work as it is, N needs to be such that all its bits after its most significant set bit are set too, e.g. for N = 7 (where last three bits are all set and all other bits are zero) and bits will be 3. Or N = 63 where the least significant six bits are all set and all other bits are zero. Here bits will be 6.

Value returned by hash_long function will form index into your array.

Handling Collisions

To handle collisions, keep just one array but make it an array of linked list nodes. So every element in the array points to a linked list. When there is a collision, just append the new entry to the end of the linked list corresponding to that slot in your array.

Handling Collisions (Update)

If you cannot allocate new memory dynamically then the solution you posted seems fine, although I'm not sure what's the purpose of array which contains just keys (shouldn't a key be a member of the element that it belongs to?). Here is suggestion towards your solution:

To have a 1-D array means that in case of a collision, we perform linear probe both when inserting as well as when retrieving. An alternative would be to have a 2-D array where inner array acts as a linked list. We will need to index of last element inserted in each of the inner arrays. The down side compared to 1-D array is that if too many collisions happen on the same index then we may run out of space in one of the inner arrays, unless we make each inner array of length N as well, which will lead to a lot of wasted space. The advantage is that when inserting, we don't need to perform a linear probe. We just check the index of last element in the inner array and increment it by one to get next slot to insert the new element in.

这篇关于我怎样才能实现一个固定大小的hashmap?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆