Python 3.3中的散列函数返回会话之间的不同结果 [英] hash function in Python 3.3 returns different results between sessions

查看:220
本文介绍了Python 3.3中的散列函数返回会话之间的不同结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在python 3.3中实现了一个BloomFilter,每次会话都得到不同的结果。深入研究这种奇怪的行为使我得到了内部的hash()函数 - 它为每个会话返回同一个字符串的不同哈希值。



示例:

 >>> hash(235)
-310569535015251310

-----打开一个新的python控制台-----

 >>> hash(235)
-1900164331622581997

为什么会发生这种情况?
为什么这有用?

解决方案

Python使用随机散列种子来防止攻击者通过将应用程序打包向您发送旨在相互冲突的密钥。请参阅原始漏洞披露。通过用随机种子偏移哈希(在启动时设置一次),攻击者无法再预测哪些键会发生碰撞。

您可以设置一个固定的种子或禁用该功能设置 PYTHONHASHSEED 环境可变;默认值是 random ,但您可以将其设置为一个固定的正整数值, 0 禁用该功能。 / p>

Python版本2.7和3.2默认禁用了该功能(使用 -R 开关或设置 PYTHONHASHSEED = random 来启用它);它在Python 3.3及更高版本中默认启用。



如果您依赖Python字典或集合中键的顺序,那么不要这样做。 Python使用一个哈希表来实现这些类型及其顺序取决于插入和删除历史记录以及随机散列种子。 另请参阅 object .__ hash __()特殊方法文档
$ b


注意:默认情况下, __ hash __() str,字节和日期时间对象的值以不可预知的随机值被盐化。虽然它们在单独的Python过程中保持不变,但在重复调用Python之间它们是不可预测的。

这是为了防止由精心挑选的输入引起的拒绝服务字典插入的情况下表现为O(n ^ 2)复杂度。请参阅 http://www.ocert.org/advisories/ocert-2011-003。 html 获取详细信息。

更改散列值会影响字典,集合和其他映射的迭代顺序。 Python从来没有对这种顺序做过保证(它通常在32位和64位版本之间不同)。

另见 PYTHONHASHSEED 。 p>

如果您需要稳定的哈希实现,您可能需要查看 hashlib 模块;这实现了加密散列函数。 pybloom项目使用这种方法

由于偏移量由一个前缀和一个后缀(分别为开始值和最终异或值)组成,因此不幸的是您不能存储偏移量。从好的一面来看,这确实意味着攻击者无法轻易地通过定时攻击来确定偏移量。

I've implemented a BloomFilter in python 3.3, and got different results every session. Drilling down this weird behavior got me to the internal hash() function - it returns different hash values for the same string every session.

Example:

>>> hash("235")
-310569535015251310

----- opening a new python console -----

>>> hash("235")
-1900164331622581997

Why is this happening? Why is this useful?

解决方案

Python uses a random hash seed to prevent attackers from tar-pitting your application by sending you keys designed to collide. See the original vulnerability disclosure. By offsetting the hash with a random seed (set once at startup) attackers can no longer predict what keys will collide.

You can set a fixed seed or disable the feature by setting the PYTHONHASHSEED environment variable; the default is random but you can set it to a fixed positive integer value, with 0 disabling the feature altogether.

Python versions 2.7 and 3.2 have the feature disabled by default (use the -R switch or set PYTHONHASHSEED=random to enable it); it is enabled by default in Python 3.3 and up.

If you were relying on the order of keys in a Python dictionary or set, then don't. Python uses a hash table to implement these types and their order depends on the insertion and deletion history as well as the random hash seed.

Also see the object.__hash__() special method documentation:

Note: By default, the __hash__() values of str, bytes and datetime objects are "salted" with an unpredictable random value. Although they remain constant within an individual Python process, they are not predictable between repeated invocations of Python.
This is intended to provide protection against a denial-of-service caused by carefully-chosen inputs that exploit the worst case performance of a dict insertion, O(n^2) complexity. See http://www.ocert.org/advisories/ocert-2011-003.html for details.
Changing hash values affects the iteration order of dicts, sets and other mappings. Python has never made guarantees about this ordering (and it typically varies between 32-bit and 64-bit builds).
See also PYTHONHASHSEED.

If you need a stable hash implementation, you probably want to look at the hashlib module; this implements cryptographic hash functions. The pybloom project uses this approach.

Since the offset consists of a prefix and a suffix (start value and final XORed value, respectively) you cannot just store the offset, unfortunately. On the plus side, this does mean that attackers cannot easily determine the offset with timing attacks either.

这篇关于Python 3.3中的散列函数返回会话之间的不同结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆