在已知的一套整数键查找 [英] Lookups on known set of integer keys

查看:158
本文介绍了在已知的一套整数键查找的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

的gperf长期表现不佳一朱迪阵列在我的环境,我想知道是否有另一种完美的哈希库在那里专门为整数键建成。我知道一套钥匙事前,我想利用这成为一个性能/规模优势。

Gperf consistently underperforms a Judy array in my environment, and I'm wondering if there is another perfect hash library out there built specifically for integer keys. I know the set of keys beforehand, and I would like to leverage that into a performance/size advantage.

有〜1000项和检索不是按顺序。密钥对都是整数。钥匙是32位,和检索值是8位。大小是最重要的因素。

There are ~1000 keys, and retrievals are not in sequential order. Key pairs are both integers. Keys are 32-bit, and retrieved values are 8-bit. Size is the most important factor.

如果有一种方法来调整gperf实现整数键,或者只是一般的另一种方法,我洗耳恭听了。 :)

If there is a way to tweak Gperf for integer keys, or just another approach in general, I'm all ears, too. :)

(旁注:...虽然打字了这个问题,我实现了一个二进制搜索可能采取的蛋糕,我刚刚过想这个问题我还是想听听你可能是为了什么想法。学习,虽然!)

(Sidenote: ...While typing out this question, I realized a binary search probably takes the cake and I've just over-thought the problem. I'd still like to hear any thoughts you may have for the sake of learning, though!)

编辑:按键分布并不均匀。大多数是随机聚集在整个可能范围。

Keys are not evenly distributed. Most are randomly clustered throughout the entire possible range.

编辑2:最坏情况下的二进制搜索是对我的口味太慢了,所以我结束了钥匙打,直到我发现了8位来自各使用使256均匀分布桶。我持有的最小和放大器;最大每个桶(24位每个存储条目),并做了一个大的结构数组的密钥对。在标准杆/速度更快,比其他一切我为我的特定情况下测试更小,所以我想我与去现在。 :)

Edit 2: Worst-case binary searches were too slow for my taste, so I ended up playing with the keys until I found 8 bits to use from each to make 256 well-distributed buckets. I held the min & max of each bucket (24 bits for each bucket entry), and made one big struct array for the key-pairs. On-par/faster and smaller than everything else I tested with for my particular case, so I guess I'm going with that for now. :)

推荐答案

请你的钥匙排序,并使用M-树检索任意键。

Keep your keys sorted, and use an M-tree to retrieve any key.

这是M-树有每个节点M条目,而不是2个二进制文件。 这将有助于极大的性能。 使用高速缓存行大小作为节点的尺寸,因此64字节的基础。 您可以在这个尺寸存储16 32位值。

An M-tree has M entry per node, instead of 2 for binaries. It will help tremendously for performance. Use a cache line size as the basis of your node size, hence 64 Bytes. You can store 16 32bits values in this size.

由于有1000个值,3个级别将足以检索右键(相对于10个级别的二叉树)。

Since you have 1000 values, 3 levels will be enough to retrieve the right key (as opposed to 10 levels for a binary tree).

另一个想法是,以你的钥匙散列到一个小的哈希表,如12位之一(4K项),并解决潜在的冲突用一个简单的产业链。你很可能会得到您的钥匙在一个单一的搜索。

Another idea would be to hash your keys into a small hash-table, such as 12-bits one (4K entries), and solve the potential collisions with a simple chain. You are very likely to get most of your keys in a single search.

这篇关于在已知的一套整数键查找的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆