功能完美的散列函数发生器 [英] Perfect hash function generator for functions

查看:133
本文介绍了功能完美的散列函数发生器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一组C ++函数。我想在散列表中映射这个函数,例如: unordered_map< function< ReturnType(Args ...)> ,SomethingElse> ,其中 SomethingElse 与此问题无关。



因为查找性能是至关重要的(应该在<$中执行) c $ c> O(1)),我想定义一个完美的哈希函数。



方案?



我知道存在完美的哈希函数生成器(如 GPERF CMPH ),但由于我从未使用过

$ b

我试图设计一个框架,其中给定用C ++编写的程序,用户可以选择一个子程序 F 定义在这个程序中的函数。



属于 F 的每个 f 实施记忆策略:当我们调用 f 与输入 i ,我们在一些数据结构中存储(i,o)。所以,如果我们用 i 调用AGAIN f ,我们将返回 o 不再执行(时间昂贵的)计算。



已经计算的结果将在不同的用户之间共享因此如果用户 u1 已经计算出 o ,则用户 u2 将使用 i (使用之前的相同注释)来节省计算时间来调用 f



显然,我们需要存储一组(f,inputs_sets)(其中 inputs_sets 是我之前谈过的已经计算的结果集),这是原始问题:我如何做



因此,使用本方案中注释中提出的枚举技巧可能是一个解决方案,假设所有用户使用 strong>枚举,这可能是一个问题:假设我们的程序有 f1 f2 f3 如果 u1 只想记住 f1 f2 (因此 F = {f1,f2} ),而 u2 只有 f3 (因此 F = {f3} )?一个过度的解决方案可能是枚举程序中定义的所有函数,但这会产生巨大的内存浪费。

解决方案

好吧,也许不是你想听到的,但考虑这样:因为你谈到几个函数,小于50,哈希查找应该是可以忽略不计,即使有冲突。



所以我的建议是把你的精力集中在别的东西,很可能一个完美的哈希函数不会带来任何类型



我要进一步,说我认为对于少于50个元素一个平面地图(好ol' vector )将具有类似的性能(或者甚至可能由于缓存局部性更好)。但是,还需要测量。


I have a set of C++ functions. I want to map this functions in an hash table, something like: unordered_map<function<ReturnType (Args...)> , SomethingElse>, where SomethingElse is not relevant for this question.

This set of functions is previously known, small (let say less than 50) and static (is not gonna change).

Since lookup performance is crucial (should be performed in O(1)), I want to define a perfect hashing function.

There exists a perfect hash function generator for this scenario?

I know that there exists perfect hashing functions generators (like GPERF or CMPH) but since I've never used them, I don't know if they're suitable for my case.

REASON:

I'm trying to design a framework where, given a program written in C++, the user can select a subset F of the functions defined in this program.

For each f belonging to F, the framework implements a memoization strategy: when we call f with input i, we store (i,o) inside some data structure. So, if we are going to call AGAIN f with i, we will return o without performing again the (time expensive) computation.

The "already computed results" will be shared among different users (maybe on the cloud), so if user u1 has already computed o, user u2 will save computing time calling f with i (using the same annotation of before).

Obviously, we need to store the set of pairs (f,inputs_sets) (where inputs_sets is the already computed results set that I talked before), which is the original question: how do I do it?

So, using the "enumeration trick" proposed in the comments in this scenario could be a solution, assuming that the all the users use the exactly same enumeration, which could be a problem: supposing that our program has f1,f2,f3 what if u1 wants to memoize only f1 and f2 (so F={f1,f2}), while u2 wants to memoize only f3 (so F={f3})? An overkill solution could be to enumerate all the functions defined in the program, but this could generate an huge waste of memory.

解决方案

Ok, maybe not what you want to hear but consider this: since you talk about a few functions, less than 50, the hash lookup should be negligible, even with collisions. Have you actually profiled and saw that the lookup is critical?

So my advise is to focus your energy on something else, most likely a perfect hash function would not bring any kind of improved performance in your case.

I am going to go one step further and say that I think that for less than 50 elements a flat map (good ol' vector) would have similar performance (or maybe even better due to cache locality). But again, measurements are required.

这篇关于功能完美的散列函数发生器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆