对Bloom过滤器使用散列函数 [英] Using hash functions with Bloom filters

查看:148
本文介绍了对Bloom过滤器使用散列函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

布隆过滤器使用散列函数(或多个)在给定输入字符串X的情况下生成介于0和m之间的值。我的问题是如何使用散列函数以这种方式生成值,例如MD5散列通常用32长度十六进制字符串表示,我将如何使用MD5散列算法来生成介于0和m之间的值,我可以指定m?我现在正在使用Java,所以使用它提供的MessageDigest功能来做这件事的例子会很棒,但如何做的一般性描述也没关系。

谢谢

解决方案

您应该首先将散列输出转换为无符号整数,然后减少模数m 。这看起来像这样:

  MessageDigest md = MessageDigest.getInstance(MD5); 
//散列数据...
byte [] hashValue = md.digest();
BigInteger n = new BigInteger(1,hashValue);
n = n.mod(m);
//此时,n的值在0和m-1之间(含)

我假定 m 是一个 BigInteger 实例。如有必要,请使用 BigInteger.valueOf()。同样,使用 n.intValue() n.longValue()来获取 n 作为Java的原始类型之一。



模块化减少有点偏向,但是如果 m 为偏远远小于 2 ^ 128


A bloom filter uses a hash function (or many) to generate a value between 0 and m given an input string X. My question is how to you use a hash function to generate a value in this way, for example an MD5 hash is typically represented by a 32 length hex string, how would I use an MD5 hashing algorithm to generate a value between 0 and m where I can specify m? I'm using Java at the moment so an example of to do this with the MessageDigest functionality it offers would be great, though just a generic description of how to do about it would be fine too.

Thanks

解决方案

You should first convert the hash output to an unsigned integer, then reduce it modulo m. This looks like this:

MessageDigest md = MessageDigest.getInstance("MD5");
// hash data...
byte[] hashValue = md.digest();
BigInteger n = new BigInteger(1, hashValue);
n = n.mod(m);
// at that point, n has a value between 0 and m-1 (inclusive)

I have assumed that m is a BigInteger instance. If necessary, use BigInteger.valueOf(). Similarly, use n.intValue() or n.longValue() to get the value of n as one of the primitive types of Java.

The modular reduction is somewhat biased, but the bias is very small if m is substantially smaller than 2^128.

这篇关于对Bloom过滤器使用散列函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆