在Bloom过滤器中使用哪个哈希函数 [英] Which hash functions to use in a Bloom filter

查看：204 发布时间：2017/11/8 20:07:44 function hash filter bloom

本文介绍了在Bloom过滤器中使用哪个哈希函数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

要使用哪个函数？

Bloom过滤器

散列函数

我知道这个（独立的，均匀分布的）是什么意思，但是我很难找到论证或讨论，哈希函数满足这些要求，因此是合适的。在很多文章中，我已经读过关于使用 FNV 或 Murmur哈希函数的建议，但不是为什么（或者至少没有证据）。

在此先感谢！

解决方案

Github自述文件

我从两个角度看问题：

速度有多快是计算吗？

输出分配是多么统一？

速度可以很容易地测量通过随机输入的基准。一致性有点困难，需要一些统计数据。使用卡方拟合优度检验，我测量了散列值分布与统一分布的相似程度。

结果是：

使用 Murmur3 可以在速度和均匀性之间找到最佳平衡点。不使用Murmur2，因为它对于以小增量变化的输入是不统一的。

使用加密散列函数 256为最好的一致性。

应用 Kirsch-Mitzenmacher-Optimization 仅计算2个而不是k个散列函数（ hash_i = hash1 + ix hash2 ）。 >

如果您的实现正在使用Java，我会推荐使用我们的Bloom过滤器哈希库。这是有据可查的和彻底的测试。有关详细信息，包括根据卡方检验的不同散列函数的基准结果及其不一致性，请参阅 Github的repo自述。
I've got the following question about choosing hash functions for Bloom filters:

Which functions to use?

In nearly every document/paper you can read that the hash functions used in a Bloom filter should be independent and uniformly distributed.

I know what is meant by this (independent and uniformly distributed), but I'm having trouble to find a argumentation or a discussion, which hash functions fulfill those requirements and are therefore suitable. In a lot of posts I've read about suggestions for the usage of the FNV or Murmur hash function, but not why (or at least without a proof) they are suitable.

Thanks in advance!
解决方案
I asked myself the same question when building a Java Bloom filter library. See the Github readme for a detailed treatment of my analysis of hash functions for Bloom filters.

I looked at the problem from two perspectives:

How fast is the computation?

How uniform is the output distribution?

Speed can easily be measured by benchmarks on random input. Uniformity is a bit harder and requires some statistics. Using Chi-Square goodness of fit tests I measured how similar the distribution of hash values is to a uniform distribution.

The result is:

Use Murmur3 for the best trade-off between speed and uniformity. Do not use Murmur2 as it is not uniform for inputs that change in small increments.

Use a cryptographic hash function like SHA-256 for the best uniformity.

Apply the Kirsch-Mitzenmacher-Optimization to only compute 2 instead of k hash functions (hash_i = hash1 + i x hash2).

If your implementation is using Java I would recommend using our Bloom filter hash library. It is well documented and thoroughly tested. For the details, including the benchmark results for different hash function and their unformity according to Chi-Square test, see the Github readme of the repo.

这篇关于在Bloom过滤器中使用哪个哈希函数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在Bloom过滤器中使用哪个哈希函数 [英] Which hash functions to use in a Bloom filter

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在Bloom过滤器中使用哪个哈希函数 [英] Which hash functions to use in a Bloom filter

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭