生成k个成对的独立哈希函数 [英] Generating k pairwise independent hash functions

查看:454
本文介绍了生成k个成对的独立哈希函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在Scala中实现 Count-Min Sketch 算法,并且所以我需要生成k个成对的独立哈希函数.

这是一个比我以前编写的程序都要低的级别,并且我对除算法类之外的哈希函数了解不多,所以我的问题是:如何生成这k个成对的独立哈希函数?/p>

我应该使用MD5或MurmurHash之类的哈希函数吗?我是否只生成了f(x) = ax + b (mod p)形式的k个哈希函数,其中p是素数,a和b是随机整数? (即每个人都在算法101中学习的通用哈希族)

我在寻找简单性而不是原始速度(例如,如果实现起来更简单,我的速度会慢5倍).

解决方案

Scala已经实现了MurmurHash(它是scala.util.MurmurHash).它非常快速并且非常擅长分配价值.加密散列实在是太过分了-您所花费的时间只会比需要的时间长数十倍或数百倍.只需选择k个不同的种子开始,由于其质量几乎是加密的,因此您将获得k很大程度上独立的哈希码. (在2.10中,您可能应该切换为使用scala.util.hashing.MurmurHash3;用法差别很大,但是您仍然可以通过混合来做相同的事情.)

如果仅需要将近值映射到随机远值,则可以使用;如果要避免冲突(即,如果A和B使用哈希1发生碰撞,它们可能也不会使用哈希2发生碰撞),那么您将需要再走至少一步,而不是哈希整个对象,而是哈希整个子对象,因此哈希有机会开始变得不同.

I'm trying to implement a Count-Min Sketch algorithm in Scala, and so I need to generate k pairwise independent hash functions.

This is a lower-level than anything I've ever programmed before, and I don't know much about hash functions except from Algorithms classes, so my question is: how do I generate these k pairwise independent hash functions?

Am I supposed to use a hash function like MD5 or MurmurHash? Do I just generate k hash functions of the form f(x) = ax + b (mod p), where p is a prime and a and b are random integers? (i.e., the universal hashing family everyone learns in algorithms 101)

I'm looking more for simplicity than raw speed (e.g., I'll take something 5x slower if it's simpler to implement).

解决方案

Scala already has MurmurHash implemented (it's scala.util.MurmurHash). It's very fast and very good at distributing values. A cryptographic hash is overkill--you'll just take tens or hundreds of times longer than you need to. Just pick k different seeds to start with and, since it's nearly cryptographic in quality, you'll get k largely independent hash codes. (In 2.10, you should probably switch to using scala.util.hashing.MurmurHash3; the usage is rather different but you can still do the same thing with mixing.)

If you only need near values to be mapped to randomly far values this will work; if you want to avoid collisions (i.e. if A and B collide using hash 1 they will probably not also collide using hash 2), then you'll need to go at least one more step and hash not the whole object but subcomponents of it so there's an opportunity for the hashes to start out different.

这篇关于生成k个成对的独立哈希函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆