为什么在 python3.4 和 python2.7 下 hash() 更慢 [英] Why is hash() slower under python3.4 vs python2.7
问题描述
我正在使用 timeit 进行一些性能评估,并发现 python 2.7.10 和 python 3.4.3 之间的性能下降.我把它缩小到 hash()
函数:
蟒蛇 2.7.10:
<预><代码>>>>导入时间>>>timeit.timeit('for x in xrange(100): hash(x)', number=100000)0.4529099464416504>>>timeit.timeit('hash(1000)')0.044638872146606445蟒蛇 3.4.3:
<预><代码>>>>导入时间>>>timeit.timeit('for x in range(100): hash(x)', number=100000)0.6459149940637872>>>timeit.timeit('hash(1000)')0.07708719989750534那是大约.降级40%!整数、浮点数、字符串(unicodes 或 bytearrays)等是否被散列似乎并不重要;退化程度大致相同.在这两种情况下,哈希都返回一个 64 位整数.以上是在我的 Mac 上运行的,在 Ubuntu 机器上降级较小(20%).
我还在 python2.7 测试中使用了 PYTHONHASHSEED=random,在某些情况下,为每个案例"重新启动 python,我看到了 hash()
性能变差了一点,但永远不会像python3.4一样慢
有人知道这是怎么回事吗?是否为 python3 选择了更安全但速度更慢的哈希函数?
hash()
函数在 Python 2.7 和 Python 3.4 之间有两个变化
- 采用 SipHash
- 默认启用哈希随机化
<小时>
参考:
- 从 Python 3.4 开始,它使用 SipHash 作为散列函数.阅读:Python 采用 SipHash
- 自 Python 3.3 哈希随机化默认启用. 参考:
object.__hash__
(本节的最后一行).指定PYTHONHASHSEED
值 0 将禁用哈希随机化.
I was doing some performance evaluation using timeit and discovered a performance degredation between python 2.7.10 and python 3.4.3. I narrowed it down to the hash()
function:
python 2.7.10:
>>> import timeit
>>> timeit.timeit('for x in xrange(100): hash(x)', number=100000)
0.4529099464416504
>>> timeit.timeit('hash(1000)')
0.044638872146606445
python 3.4.3:
>>> import timeit
>>> timeit.timeit('for x in range(100): hash(x)', number=100000)
0.6459149940637872
>>> timeit.timeit('hash(1000)')
0.07708719989750534
That's an approx. 40% degradation! It doesn't seem to matter if integers, floats, strings(unicodes or bytearrays), etc, are being hashed; the degradation is about the same. In both cases the hash is returning a 64-bit integer. The above was run on my Mac, and got a smaller degradation (20%) on an Ubuntu box.
I've also used PYTHONHASHSEED=random for the python2.7 tests and in some cases, restarting python for each "case", I saw the hash()
performance get a bit worse, but never as slow as python3.4
Anyone know what's going on here? Was a more-secure, but slower, hash function chosen for python3 ?
There are two changes in hash()
function between Python 2.7 and Python 3.4
- Adoptions of SipHash
- Default enabling of Hash randomization
References:
- Since from Python 3.4, it uses SipHash for it's hashing function. Read: Python adopts SipHash
- Since Python 3.3 Hash randomization is enabled by default. Reference:
object.__hash__
(last line of this section). SpecifyingPYTHONHASHSEED
the value 0 will disable hash randomization.
这篇关于为什么在 python3.4 和 python2.7 下 hash() 更慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!