MurmurHash - 它是什么? [英] MurmurHash - what is it?

查看:35
本文介绍了MurmurHash - 它是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试对 MurmurHash 的作用有一个深入的了解.

I've been trying to get a high level understanding of what MurmurHash does.

我已阅读基本说明,但尚未找到有关何时使用它以及为什么使用它的良好解释.我知道它很快,但想了解更多.

I've read a basic description but have yet to find a good explanation of when to use it and why. I know its very fast but want to know a bit more.

我问了一个相关的问题,关于如何将 UUID 放入 Redis 位集,有人建议使用 MurmurHash.它有效,但我想了解风险/好处.

I asked a related question about how I could fit a UUID into a Redis bitset, and someone suggested using MurmurHash. It works but I'd like to understand the risks/benefits.

推荐答案

Murmur 是一系列优秀的通用散列函数,适用于非加密用途.正如 Austin Appleby 所说,MurmurHash 提供以下好处:

Murmur is a family of good general purpose hashing functions, suitable for non-cryptographic usage. As stated by Austin Appleby, MurmurHash provides the following benefits:

  • 简单(就生成的汇编指令的数量而言).
  • 良好的分布(通过了几乎所有键集和存储桶大小的卡方检验.
  • 良好的雪崩行为(最大偏差为 0.5%).
  • 良好的抗碰撞性(通过 Bob Jenkin 的frog.c 酷刑测试.4 字节密钥不可能发生碰撞,没有小的(1 到 7 位)差异).
  • 在 Intel/AMD 硬件上具有出色的性能,在哈希质量和 CPU 消耗之间取得了良好的平衡.
  • simple (in term of number of generated assembly instructions).
  • good distribution (passing chi-squared tests for practically all keysets & bucket sizes.
  • good avalanche behavior (max bias of 0.5%).
  • good collision resistance (passes Bob Jenkin's frog.c torture-test. No collisions possible for 4-byte keys, no small (1- to 7-bit) differentials).
  • great performance on Intel/AMD hardware, good tradeoff between hash quality and CPU consumption.

您当然可以使用它来散列 UUID(就像任何其他高级散列函数一样:CityHash、Jenkins、Paul Hsieh 等......).现在,Redis 位集限制为 4 GB 位(512 MB).所以你需要将 128 位的数据(UUID)减少到 32 位(散列值).无论散列函数的质量如何,都会有冲突.

You can certainly use it to hash UUIDs (like any other advanced hashing functions: CityHash, Jenkins, Paul Hsieh's, etc ...). Now, a Redis bitset is limited to 4 GB bits (512 MB). So you need to reduce 128 bits of data (UUID) to 32 bits (hashed value). Whatever the quality of the hashing function, there will be collisions.

使用像 Murmur 这样的工程散列函数可以最大限度地提高分布质量,并最大限度地减少冲突次数,但不能提供其他保证.

Using an engineered hash function like Murmur will maximize the quality of the distribution, and minimize the number of collisions, but it offers no other guarantee.

以下是一些比较通用哈希函数质量的链接:

Here are some links comparing the quality of general purpose hash functions:

http://www.azillionmonkeys.com/qed/hash.html

http://www.strchr.com/hash_functions

http://blog.aggregateknowledge.com/2011/12/05/choosing-a-good-hash-function-part-1/

http://blog.aggregateknowledge.com/2011/12/29/choosing-a-good-hash-function-part-2/

http://blog.aggregateknowledge.com/2012/02/02/choosing-a-good-hash-function-part-3/

这篇关于MurmurHash - 它是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆