什么时候在Python中hash(n)== n? [英] When is hash(n) == n in Python?
问题描述
我一直在使用Python的哈希函数.对于小整数,始终显示为hash(n) == n
.但是,这并不能扩展为大量数字:
I've been playing with Python's hash function. For small integers, it appears hash(n) == n
always. However this does not extend to large numbers:
>>> hash(2**100) == 2**100
False
我并不感到惊讶,我知道哈希值取值范围有限.那是什么范围?
I'm not surprised, I understand hash takes a finite range of values. What is that range?
我尝试使用二进制搜索来找到最小的数字hash(n) != n
I tried using binary search to find the smallest number hash(n) != n
>>> import codejamhelpers # pip install codejamhelpers
>>> help(codejamhelpers.binary_search)
Help on function binary_search in module codejamhelpers.binary_search:
binary_search(f, t)
Given an increasing function :math:`f`, find the greatest non-negative integer :math:`n` such that :math:`f(n) \le t`. If :math:`f(n) > t` for all :math:`n \ge 0`, return None.
>>> f = lambda n: int(hash(n) != n)
>>> n = codejamhelpers.binary_search(f, 0)
>>> hash(n)
2305843009213693950
>>> hash(n+1)
0
2305843009213693951有什么特别之处?我注意到它小于sys.maxsize == 9223372036854775807
What's special about 2305843009213693951? I note it's less than sys.maxsize == 9223372036854775807
我正在使用Python3.我在Python 2上运行了相同的二进制搜索,得到了不同的结果2147483648,我注意到它是sys.maxint+1
I'm using Python 3. I ran the same binary search on Python 2 and got a different result 2147483648, which I note is sys.maxint+1
我还玩过[hash(random.random()) for i in range(10**6)]
来估计哈希函数的范围.最大值始终低于上面的n.比较最小值,似乎Python 3的哈希值始终为正值,而Python 2的哈希值可以为负值.
I also played with [hash(random.random()) for i in range(10**6)]
to estimate the range of hash function. The max is consistently below n above. Comparing the min, it seems Python 3's hash is always positively valued, whereas Python 2's hash can take negative values.
推荐答案
基于中的python文档pyhash.c
文件:
对于数字类型,数字x的哈希值基于减少量 x以素数
P = 2**_PyHASH_BITS - 1
取模的形式.它的设计目的是hash(x) == hash(y)
只要x和y在数值上相等,即使 x和y具有不同的类型.
For numeric types, the hash of a number x is based on the reduction of x modulo the prime
P = 2**_PyHASH_BITS - 1
. It's designed so thathash(x) == hash(y)
whenever x and y are numerically equal, even if x and y have different types.
因此对于64/32位计算机,减少量将为2 _PyHASH_BITS -1,但是_PyHASH_BITS
是什么?
So for a 64/32 bit machine, the reduction would be 2 _PyHASH_BITS - 1, but what is _PyHASH_BITS
?
您可以在 pyhash.h
头文件中找到它,该文件为64位机器已定义为61(您可以在pyconfig.h
文件中阅读更多说明).
You can find it in pyhash.h
header file which for a 64 bit machine has been defined as 61 (you can read more explanation in pyconfig.h
file).
#if SIZEOF_VOID_P >= 8
# define _PyHASH_BITS 61
#else
# define _PyHASH_BITS 31
#endif
因此首先要基于您的平台,例如在我的64位Linux平台上,减少量为2 61 -1,即2305843009213693951
:
So first off all it's based on your platform for example in my 64bit Linux platform the reduction is 261-1, which is 2305843009213693951
:
>>> 2**61 - 1
2305843009213693951
还可以使用math.frexp
来获取sys.maxint
的尾数和指数,对于64位计算机,该值显示max int为2 63 :
Also You can use math.frexp
in order to get the mantissa and exponent of sys.maxint
which for a 64 bit machine shows that max int is 263:
>>> import math
>>> math.frexp(sys.maxint)
(0.5, 64)
您可以通过一个简单的测试来查看差异:
And you can see the difference by a simple test:
>>> hash(2**62) == 2**62
True
>>> hash(2**63) == 2**63
False
阅读有关python哈希算法的完整文档 https://github.com/python/cpython/blob/master/Python/pyhash.c#L34
Read the complete documentation about python hashing algorithm https://github.com/python/cpython/blob/master/Python/pyhash.c#L34
如注释中所述,您可以使用sys.hash_info
(在python 3.X中),这将为您提供用于计算的参数的结构序列
散列.
As mentioned in comment you can use sys.hash_info
(in python 3.X) which will give you a struct sequence of parameters used for computing
hashes.
>>> sys.hash_info
sys.hash_info(width=64, modulus=2305843009213693951, inf=314159, nan=0, imag=1000003, algorithm='siphash24', hash_bits=64, seed_bits=128, cutoff=0)
>>>
除了我在前几行中描述的模数之外,还可以得到inf
值,如下所示:
Alongside the modulus that I've described in preceding lines, you can also get the inf
value as following:
>>> hash(float('inf'))
314159
>>> sys.hash_info.inf
314159
这篇关于什么时候在Python中hash(n)== n?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!