为什么-1和-2在CPython中散列为-2? [英] Why do -1 and -2 both hash to -2 in CPython?

查看:127
本文介绍了为什么-1和-2在CPython中散列为-2?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


可能存在重复:

什么时候计算python对象的散列值,为什么是-1的散列值?


为什么 -1 -2 这两个哈希都是相同的数字,如果Python?



因为他们这样做,Python如何区分这两个数字?

 >>> -1是-2 
False
>>> hash(-1)是散列(-2)

>>> hash(-1)
-2
>>> hash(-2)
-2


解决方案

-1 是CPython C级别的保留值,可防止哈希函数产生 -1 。正如DSM所指出的那样,在IronPython和PyPy中, hash(-1)!= hash(-2)

请参阅这个Quora答案


如果您在C扩展模块中编写了一个类型并提供了一个 tp_hash
方法,您必须避免 -1 - 如果您返回 -1 ,Python会假定你想抛出一个错误


如果你用纯Python编写一个类并提供一个 __ hash __ 方法,
没有这样的要求,谢天谢地。但那是因为调用你的 __ hash __ 方法的C代码
为你做了这些 - 如果你的
__ hash __ 返回 -1 ,然后应用于你的对象的 hash()实际上会返回 -2


真正重新包装 effbot


散列值 -1 被保留(它用于标记C
实现中的错误)。如果哈希算法产生这个值,我们只需
就可以使用 -2 来替代。

你也可以在源代码中看到它。例如,对于Python 3的 int 对象,它位于散列实现

  if(x = =(Py_uhash_t)-1)
x =(Py_uhash_t)-2;
return(Py_hash_t)x;







,Python如何区分这两个数字?


由于所有哈希函数都将较大的输入空间映射到较小的输入空间,因此碰撞是总是期望的,不管散列函数有多好。例如,考虑散列字符串。如果散列码是32位整数,则有2 ^ 32(超过40亿)散列码。如果考虑所有长度为6的ASCII字符串,则在输入空间中有(2 ^ 7)^ 6(略低于4.4万亿)不同的项目。只有这一套,无论你有多好,你都能保证有多次碰撞。添加Unicode字符和无限长的字符串!



因此,哈希代码只在对象位置提供 ,相等性测试跟随测试候选键。要在哈希表集合中实现成员资格测试,哈希代码将为您提供桶编号以在其中搜索该值。但是,具有相同散列码的所有设置项目都在存储桶中。为此,您还需要进行相等性测试来区分桶中的所有候选项。



这个哈希码和相等的对偶性在关于可哈希对象的CPython文档。在其他语言/框架中,有一条准则/规则,如果您提供自定义哈希代码函数,则还必须提供自定义相等性测试(与哈希代码函数在相同字段上执行)。






事实上,今天的Python版本正好解决了这个问题,并且提供了一个安全补丁,用于解决效率问题(相同的哈希值,但大规模)被用作拒绝服务攻击 - http:// mail .python.org / pipermail / python-list / 2012-April / 1290792.html


Possible Duplicate:
When is a python object's hash computed and why is the hash of -1 different?

Why do -1 and -2 both hash to the same number if Python?

Since they do, how does Python tell these two numbers apart?

>>> -1 is -2
False
>>> hash(-1) is hash(-2)
True
>>> hash(-1)
-2
>>> hash(-2)
-2

解决方案

-1 is a reserved value at the C level of CPython which prevents hash functions from being able to produce a hash value of -1. As noted by DSM, the same is not true in IronPython and PyPy where hash(-1) != hash(-2).

See this Quora answer:

If you write a type in a C extension module and provide a tp_hash method, you have to avoid -1 — if you return -1, Python will assume you meant to throw an error.

If you write a class in pure Python and provide a __hash__ method, there's no such requirement, thankfully. But that's because the C code that invokes your __hash__ method does that for you — if your __hash__ returns -1, then hash() applied to your object will actually return -2.

Which really just repackages the information from effbot:

The hash value -1 is reserved (it’s used to flag errors in the C implementation). If the hash algorithm generates this value, we simply use -2 instead.

You can also see this in the source. For example for Python 3’s int object, this is at the end of the hash implementation:

if (x == (Py_uhash_t)-1)
    x = (Py_uhash_t)-2;
return (Py_hash_t)x;


Since they do, how does Python tell these two numbers apart?

Since all hash functions map a large input space to a smaller input space, collisions are always expected, no matter how good the hash function is. Think of hashing strings, for example. If hash codes are 32-bit integers, you have 2^32 (a little more than 4 billion) hash codes. If you consider all ASCII strings of length 6, you have (2^7)^6 (just under 4.4 trillion) different items in your input space. With only this set, you are guaranteed to have many, many collisions no matter how good you are. Add Unicode characters and strings of unlimited length to that!

Therefore, the hash code only hints at the location of an object, an equality test follows to test candidate keys. To implement a membership test in a hash-table set, the hash code gives you "bucket" number in which to search for the value. However, all set items with the same hash code are in the bucket. For this, you also need an equality test to distinguish between all candidates in the bucket.

This hash code and equality duality is hinted at in the CPython documentation on hashable objects. In other languages/frameworks, there is a guideline/rule that if you provide a custom hash code function, you must also provide a custom equality test (performed on the same fields as the hash code function).


Indeed, the Python release today address exactly this, with a security patch that addresses the efficiency issue when this (identical hash values, but on a massive scale) is used as a denial of service attack - http://mail.python.org/pipermail/python-list/2012-April/1290792.html

这篇关于为什么-1和-2在CPython中散列为-2?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆