Bloom Filter:评估假阳性率 [英] Bloom Filter: evaluating false positive rate

查看:1407
本文介绍了Bloom Filter:评估假阳性率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定固定数量的位(例如,槽)(m)和固定数量的散列函数(k),如何计算理论假阳性率(p)?



根据维基百科 http://en.wikipedia.org/wiki/Bloom_filter ,对于假阳性率(p)和项数(n),所需的位数(m)由下式给出: m = - n * l(p)/(l(2) ^ 2),并且最优数目的散列函数(k)由 k = m / n * l(2)给出。 >

从维基百科页面给出的公式,我想我可以通过以下方式评估理论假阳性率(p): p =(1-e - (k * n / m))^ k



但维基百科有另一个公式为(p): p = e(-m / n *(l(2)^ 2))我想,假设(k)是哈希函数的最优数。



对于我的例子,我拿了 n = 1000000 m = n * 2 (k)wo的最优值根据上述公式,理论假阳性率(p)为0.382,为1.386。
我们选择函数的数量,计算给定固定(k)的理论假阳性率(p),并计算理论上所需的位数(m'):


$ b $对于k = 1,p = .393和m'= 1941401
对于k = 2,p = .399和m'= 1909344
,对于k = 1,b

  k = 3,p = .469和m'= 1576527 
k = 4,p = .559和m'= 1210636

过滤器中填充的位数越多,我们得到的假阳性越大。似乎是合乎逻辑的。



但是可以确认公式 p =(1 - e( - (k * n / m))) ^ k 是正确的,以获得固定(k),(m)和(n)?的理论假阳性率?



注意:这个问题似乎已经在这里提出:有了固定数量的函数,如何计算出布尔筛选器的大小,给出假阳性的概率?,但是没有符合我的确切问题的答案。
我的绽放过滤器需要多少个哈希函数? 可能是感兴趣的,但再次不完全一样。



Regards

解决方案

m - 位数组元素的数量
n - 集合中的项目数
p - 假阳性概率// 0.0 - 1.0
^ - 电力



p = e ^( - (m / n)*(ln(2)^ 2));



Bloom过滤器上的数学友好教程:
http:// techeffigy.wordpress.com/2014/06/05/bloom-filter-tutorial/


Given a fixed number of bits (eg. slot) (m) and a fixed number of hash function (k), how one compute the theoretical false positive rate (p) ?

According to Wikipedia http://en.wikipedia.org/wiki/Bloom_filter, for a false positive rate (p) and a number of item (n), the number of bits (m) needed is given by m = - n * l(p) / (l(2)^2) and the optimal number of hash function (k) is given by k = m / n * l(2).

From the formula given in Wikipedia page, I guess I could evaluate the theoretical false positive rate (p) by the following: p = (1 - e(-(k * n/m)))^k

But Wikipedia has another formula for (p) : p = e(-m/n*(l(2)^2)) which, I suppose, assume that (k) is the optimal number of hash function.

For my example, I took n = 1000000 and m = n * 2, the optimal value for (k) would be 1.386, and the theoretical false positive rate (p) would be 0.382 according the previous formula. Let's choose the number of function, compute the theoretical false positive rate (p) given a fixed (k) and compute the theoretical number of bits needed (m'):

for k = 1, p = .393 and m' = 1941401
for k = 2, p = .399 and m' = 1909344
for k = 3, p = .469 and m' = 1576527
for k = 4, p = .559 and m' = 1210636

The more bits are stuffed in the filter, the more false positive we get. Seems logical.

But could one confirm that formula p = (1 - e(-(k * n/m)))^k is correct to get the theoretical false positive rate given a fixed (k),(m) and (n) ?

Note: the question seems already asked here: With fixed number of functions, how can I calculate the size of a Bloom Filter given the probability of false positives? but there's no answer that match my exact question. How many hash functions does my bloom filter need? might be of interest, but again it's not exactly the same.

Regards

解决方案

m – number of elements in bit array n – number of items in collection p – false positive probability // 0.0 – 1.0 ^ – power

p = e^(-(m/n) * (ln(2)^2));

I wrote a math friendly tutorial on Bloom Filters : http://techeffigy.wordpress.com/2014/06/05/bloom-filter-tutorial/

这篇关于Bloom Filter:评估假阳性率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆