Python便携式哈希 [英] Python Portable Hash

查看:167
本文介绍了Python便携式哈希的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在阅读pyspark的源代码此处。我在这里对便携式功能感到困惑。

  def portable_hash(x):

此函数返回内置类型的一致散列码,特别是
(无)和元组(无)
该算法类似于CPython使用的算法2.7
>>> portable_hash(无)
0
>>> portable_hash((None,1))& 0xffffffff
219750521

如果x是None:
return 0
if isinstance(x,tuple):
h = 0x345678
for i in x:$ b $ ^ = portable_hash(i)
h * = 1000003
h& = sys.maxint
h ^ = len(x)
如果h == -1:
h = -2
返回h
返回散列(x)

我可以看到它是一个递归函数。如果输入是一个元组,然后递归循环遍历每个元素。



以下是我的一些问题:


  1. 到一个映射?

  2. 这个函数只考虑无和元组和
    可哈希值,我知道list对象不是
    hashable,他们是否故意这么做或者不是。
  3. 我没有太多的散列经验,这是一个非常经典的散列方法,如果有的话,是否有任何资源让我更好地理解它? / li>


解决方案


这是一种一对一的映射吗?


散列方法是1对1:它们都将M个可能的输入映射到N个可能的整数结果,其中M比N大得多。

lockquote

这个函数只考虑None和元组和可哈希值,我知道这个列表对象不可散列,他们是否故意这样做。

blockquote>

是的,这个函数委托给内置 hash 除了元组以外的其他所有元素, None 。而且它绝对是Python中的一个有意设计的决定(也是由这个函数所尊重)来制作诸如 list dict


我没有太多的散列经验,这是一个非常经典的哈希方法,如果有的话,是否有任何资源让我更好地理解它?

是,异或'哈希的项目,随着运行总量的修改,是一个非常经典的方法来散列可哈希物品的容器。



关于哈希的更多研究,我会从 http://en.wikipedia.org/wiki/Hash_function 开始。


I am reading the source code of pyspark here. I got really confused by the portable function here.

def portable_hash(x):
    """
    This function returns consistant hash code for builtin types, especially
    for None and tuple with None.
    The algrithm is similar to that one used by CPython 2.7
    >>> portable_hash(None)
    0
    >>> portable_hash((None, 1)) & 0xffffffff
    219750521
    """
    if x is None:
        return 0
    if isinstance(x, tuple):
        h = 0x345678
        for i in x:
            h ^= portable_hash(i)
            h *= 1000003
            h &= sys.maxint
        h ^= len(x)
        if h == -1:
            h = -2
        return h
    return hash(x)

I can see that it is a recursive function. If the input is a tuple, then recursively loop through every element.

Here are few of my questions:

  1. is this hash approach an one to one mapping?
  2. this function only takes None and tuple and hashable values into consideration, I know that list object is not hashable, did they do that intentionally or not.
  3. I don't have much experience with hash, is this a very classic hashing approach, if so, is there any resource for me to get a better understanding for it?

解决方案

"is this hash approach an one to one mapping?"

NO hashing approach is 1-to-1: they all map M possible inputs into N possible integer results where M is much larger than N.

"this function only takes None and tuple and hashable values into consideration, I know that list object is not hashable, did they do that intentionally or not."

Yes, this function delegates to built-in hash everything else except tuples and None. And it's definitely a deliberate design decision in Python (also respected by this function) to make mutable built-ins such as list and dict not hashable.

"I don't have much experience with hash, is this a very classic hashing approach, if so, is there any resource for me to get a better understanding for it?"

Yes, exclusive-or'ing hash of items, with modification of the running total as one goes through, is indeed a very classic approach to hashing containers of hashable items.

For more study about hashing, I'd start at http://en.wikipedia.org/wiki/Hash_function .

这篇关于Python便携式哈希的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆