是否可以为最小的(< 64)键集创建最小的完美哈希函数而无需单独的查找表? [英] Is it possible to create a Minimal Perfect Hash function without a separate lookup table for a small (<64) set of keys?

查看:107
本文介绍了是否可以为最小的(< 64)键集创建最小的完美哈希函数而无需单独的查找表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近阅读了这篇文章扔掉关键:简单,最小的完美散列有关

I recently read this article Throw away the keys: Easy, Minimal Perfect Hashing about generating a minimal perfect hash table for a known set of keys.

该文章似乎假设您需要一个中间表。如果我们假设键集很小(即< 64),还有其他更简单的方法来生成这样的函数。

The article seems to assume that you need an intermediate table. Is there any other, simpler way to generate such a function if we assume that the set of keys is small (i.e. < 64).

将一组线程ID映射到数组中唯一的数据块。线程在生成哈希函数之前启动,并在程序运行期间保持不变。确切的线程数有所不同,但在程序运行期间保持不变:

In my case, I want to map a set of thread ID:s to a unique block of data within an array. The threads are started before the hash function is generated and stay constant during the running time of the program. The exact number of threads vary but stays fixed during the runtime of the program:

unsigned int thread_ids*;
unsigned int thread_count;
struct {
    /* Some thread specific data */
}* ThreadData;

int start_threads () {
    /* Code which starts the threads and allocates the threaddata. */
}

int f(thread_id) {
    /* return unique index into threadData */
}

int main() {
    thread_count = 64; /* This number will be small, e.g. < 64 */
    start_threads();
    ThreadData[f(thread_ids[0])]
}


推荐答案

是的,您可以在运行时构建最小完美哈希函数(MPHF)。您可以使用多种算法,但是大多数算法实现起来都有些复杂,因此我无法为您提供有效的示例代码。许多是在 cmph项目中实现的。

Yes, you can build a minimal perfect hash function (MPHF) at runtime. There are multiple algorithms you can use, but most of them are a bit complex to implement so I can't give you working sample code. Many are implemented in the cmph project.

最简单的一个可能是BDZ。在较高的级别上,查找需要计算3个哈希函数和3个内存访问。如果内存不是问题,则只需2。它支持数百万个密钥。此算法需要一个查找表,该表大约是条目数量的1.23倍。

The most simple one is probably BDZ. On a high level, lookup requires calculating 3 hash functions, and 3 memory accesses. If memory isn't an issue, you only need 2. It supports millions of keys. This algorithm requires a lookup table that is about 1.23 times the number of entries.

还有其他算法,我自己发明了一种, RecSplit算法,但是我没有C实现,只有 Java 现在。基本上,这些算法找到了一种方法(以递归方式)将集合拆分为子集,直到子集大小为1。您需要记住如何拆分。实际上,最简单的解决方案是使用查找表如何拆分,但是该表确实很小,对于64个键来说可能只有5个整数。第一个分为4个子集,分别为16个子集,第4个子集将每个子集映射到数字0..15。

There are other algorithms, one I invented myself, the RecSplit algorithm, but I don't have a C implementation, only Java right now. Basically, the algorithms finds a way to split the set into subsets (recursively), until the subset size is 1. You need to remember how you split. The most simple solution is in fact using a lookup table for "how you split", but the table is really small, possibly only 5 integers for 64 keys. The first one to divide into 4 subsets of 16, and 4 to map each subset to a number 0..15.

(如果您不这样做,我会添加第二个答案严格地需要一个 minimum 完美的哈希函数,一个 perfect 哈希函数。构造更简单,查找也快很多,但需要更大的数组。)

(I added a second answer if you don't strictly need a minimal perfect hash function, just a perfect hash function. Construction is simpler and lookup is a lot faster, but requires a larger array.)

这篇关于是否可以为最小的(&lt; 64)键集创建最小的完美哈希函数而无需单独的查找表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆