多少散列函数确实我的布隆过滤器所需要的? [英] How many hash functions does my bloom filter need?
问题描述
维基百科说:
这是空的布隆过滤器是m比特的位阵列,全部设置为0。此外,还必须为K定义不同的散列函数,其中每个映射或散列一些组元素到米阵列位置中的一个与一个均匀随机分布。
An empty Bloom filter is a bit array of m bits, all set to 0. There must also be k different hash functions defined, each of which maps or hashes some set element to one of the m array positions with a uniform random distribution.
我读了一篇文章,但我不明白的是如何k被确定。它是表的大小的功能?
I read the article, but what I don't understand is how k is determined. Is it a function of the table size?
此外,在哈希表我写我使用了一个简单而有效的算法自动增长的哈希的大小。基本上,如果曾经超过50%,在表中的桶填充,我将加倍的表的大小。我怀疑你可能仍然希望这样做有一个布隆过滤器,以减少误报。是否正确?
Also, in hash tables I've written I used a simple but effective algorithm for automatically growing the hash's size. Basically, if ever more than 50% of the buckets in the table were filled, I would double the size of the table. I suspect you might still want to do this with a bloom filter to reduce false positives. Correct?
推荐答案
如果您了解布鲁姆的维基百科的文章中进一步回落过滤器,然后你发现一个部分的误报概率的。本节介绍的哈希函数的数量影响的误报的概率,并为您的公式来确定的 K 的距离所需要的预期概率。误报。
If you read further down in the Wikipedia article about Bloom filters, then you find a section Probability of false positives. This section explains how the number of hash functions influences the probabilities of false positives and gives you the formula to determine k from the desired expected prob. of false positives.
从维基百科的文章引用:
Quote from the Wikipedia article:
显然,错误的概率 阳性降低为m(数 阵列中的位)的增加,并 增加为n(的插入数 元素)增加。对于给定的米和 N,k的值(散列数 函数),最大限度地减少了 概率为
Obviously, the probability of false positives decreases as m (the number of bits in the array) increases, and increases as n (the number of inserted elements) increases. For a given m and n, the value of k (the number of hash functions) that minimizes the probability is
这篇关于多少散列函数确实我的布隆过滤器所需要的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!