MurmurHash - 它是什么? [英] MurmurHash - what is it?

查看:769
本文介绍了MurmurHash - 它是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直试图深入了解 MurmurHash 的功能。

我已阅读了基本说明,但尚未找到何时使用它以及为什么的良好解释。我知道它非常快,但想知道更多。



我问了一个相关的关于如何将UUID适配到Redis bitset的问题,以及有人建议使用MurmurHash。它的工作原理,但我想了解风险/好处。

解决方案Murmur是一个很好的通用哈希函数家族,适合非加密使用。按照Austin Appleby的说法,MurmurHash提供了以下好处:简单(根据生成的汇编指令的数量)。
  • 良好的分布(通过几乎所有键集和桶大小的卡方检验)

  • avalanche 行为(最大偏差为0.5%)。
  • 良好的碰撞阻力(通过Bob Jenkin的frog.c酷刑测试。 (1到7位)差异)。
  • 在英特尔/ AMD硬件上性能卓越,散列质量和CPU消耗之间的折衷。


    $ b

    你可以使用它来对UUID进行哈希(就像任何其他高级哈希函数:CityHash,Jenkins,Paul Hsieh等等)。 Redis bitset限制为4 GB(512 MB),因此您需要将128位数据(UUID)减少到32位(散列值)。无论哈希函数的质量如何,将会发生冲突。

    使用像Murmur这样的工程哈希函数可以最大限度地提高分布的质量并减少冲突的次数,但它不能提供其他保证。 p>

    以下是一些比较通用散列函数质量的链接: http://www.azillionmonkeys.com/qed/hash.html



    http://www.strchr.com/hash_functions



    http://blog.aggregateknowledge.com/2011/12/05/choosing- a-good-hash-function-part-1 /

    http://blog.aggregateknowledge.com/2011/12/29/choosing-a-good-hash-function-part-2/



    http://blog.aggregateknowledge.com/2012/02/02/choosing-a-good-hash-function-part-3/


    I've been trying to get a high level understanding of what MurmurHash does.

    I've read a basic description but have yet to find a good explanation of when to use it and why. I know its very fast but want to know a bit more.

    I asked a related question about how I could fit a UUID into a Redis bitset, and someone suggested using MurmurHash. It works but I'd like to understand the risks/benefits.

    解决方案

    Murmur is a family of good general purpose hashing functions, suitable for non-cryptographic usage. As stated by Austin Appleby, MurmurHash provides the following benefits:

    • simple (in term of number of generated assembly instructions).
    • good distribution (passing chi-squared tests for practically all keysets & bucket sizes.
    • good avalanche behavior (max bias of 0.5%).
    • good collision resistance (passes Bob Jenkin's frog.c torture-test. No collisions possible for 4-byte keys, no small (1- to 7-bit) differentials).
    • great performance on Intel/AMD hardware, good tradeoff between hash quality and CPU consumption.

    You can certainly use it to hash UUIDs (like any other advanced hashing functions: CityHash, Jenkins, Paul Hsieh's, etc ...). Now, a Redis bitset is limited to 4 GB bits (512 MB). So you need to reduce 128 bits of data (UUID) to 32 bits (hashed value). Whatever the quality of the hashing function, there will be collisions.

    Using an engineered hash function like Murmur will maximize the quality of the distribution, and minimize the number of collisions, but it offers no other guarantee.

    Here are some links comparing the quality of general purpose hash functions:

    http://www.azillionmonkeys.com/qed/hash.html

    http://www.strchr.com/hash_functions

    http://blog.aggregateknowledge.com/2011/12/05/choosing-a-good-hash-function-part-1/

    http://blog.aggregateknowledge.com/2011/12/29/choosing-a-good-hash-function-part-2/

    http://blog.aggregateknowledge.com/2012/02/02/choosing-a-good-hash-function-part-3/

    这篇关于MurmurHash - 它是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆