Python 3.3 中的哈希函数在会话之间返回不同的结果 [英] hash function in Python 3.3 returns different results between sessions

查看:39
本文介绍了Python 3.3 中的哈希函数在会话之间返回不同的结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在 python 3.3 中实现了一个 BloomFilter,并且每个会话都得到了不同的结果.深入研究这种奇怪的行为让我了解了内部 hash() 函数 - 它在每个会话中为同一字符串返回不同的哈希值.

示例:

<预><代码>>>>哈希(235")-310569535015251310

-----打开一个新的python控制台-----

<预><代码>>>>哈希(235")-1900164331622581997

为什么会这样?为什么这很有用?

解决方案

Python 使用随机散列种子来防止攻击者通过向您发送旨在冲突的密钥来破坏您的应用程序.请参阅原始漏洞披露.通过使用随机种子(在启动时设置一次)来抵消散列,攻击者无法再预测哪些键会发生冲突.

您可以通过设置PYTHONHASHSEED 环境变量;默认值为 random,但您可以将其设置为固定的正整数值,0 完全禁用该功能.

Python 2.7 和 3.2 版本默认禁用该功能(使用 -R 开关或设置 PYTHONHASHSEED=random 启用它);它在 Python 3.3 及更高版本中默认启用.

如果您依赖 Python 集中键的顺序,那就不要.Python 使用哈希表来实现这些类型及其顺序 取决于插入和删除历史 以及随机哈希种子.请注意,在 Python 3.5 及更早版本中,这也适用于字典.

另见object.__hash__() 特殊方法文档:

<块引用>

注意:默认情况下,str、bytes 和 datetime 对象的 __hash__() 值是用不可预测的随机值加盐"的.尽管它们在单个 Python 进程中保持不变,但它们在 Python 的重复调用之间是不可预测的.

这是为了防止由精心选择的输入引起的拒绝服务,这些输入利用了 dict 插入的最坏情况性能,O(n^2) 复杂度.请参阅 http://www.ocert.org/advisories/ocert-2011-003.html 了解详情.

更改哈希值会影响字典、集合和其他映射的迭代顺序.Python 从未对此排序做出保证(并且它通常在 32 位和 64 位版本之间变化).

另见PYTHONHASHSEED.

如果你需要一个稳定的哈希实现,你可能想看看hashlib 模块;这实现了加密哈希函数.pybloom 项目使用这种方法.

由于偏移量由前缀和后缀(分别为起始值和最终异或值)组成,因此您不能只存储偏移量,不幸的是.从好的方面来说,这确实意味着攻击者也无法通过定时攻击轻松确定偏移量.

I've implemented a BloomFilter in python 3.3, and got different results every session. Drilling down this weird behavior got me to the internal hash() function - it returns different hash values for the same string every session.

Example:

>>> hash("235")
-310569535015251310

----- opening a new python console -----

>>> hash("235")
-1900164331622581997

Why is this happening? Why is this useful?

解决方案

Python uses a random hash seed to prevent attackers from tar-pitting your application by sending you keys designed to collide. See the original vulnerability disclosure. By offsetting the hash with a random seed (set once at startup) attackers can no longer predict what keys will collide.

You can set a fixed seed or disable the feature by setting the PYTHONHASHSEED environment variable; the default is random but you can set it to a fixed positive integer value, with 0 disabling the feature altogether.

Python versions 2.7 and 3.2 have the feature disabled by default (use the -R switch or set PYTHONHASHSEED=random to enable it); it is enabled by default in Python 3.3 and up.

If you were relying on the order of keys in a Python set, then don't. Python uses a hash table to implement these types and their order depends on the insertion and deletion history as well as the random hash seed. Note that in Python 3.5 and older, this applies to dictionaries, too.

Also see the object.__hash__() special method documentation:

Note: By default, the __hash__() values of str, bytes and datetime objects are "salted" with an unpredictable random value. Although they remain constant within an individual Python process, they are not predictable between repeated invocations of Python.

This is intended to provide protection against a denial-of-service caused by carefully-chosen inputs that exploit the worst case performance of a dict insertion, O(n^2) complexity. See http://www.ocert.org/advisories/ocert-2011-003.html for details.

Changing hash values affects the iteration order of dicts, sets and other mappings. Python has never made guarantees about this ordering (and it typically varies between 32-bit and 64-bit builds).

See also PYTHONHASHSEED.

If you need a stable hash implementation, you probably want to look at the hashlib module; this implements cryptographic hash functions. The pybloom project uses this approach.

Since the offset consists of a prefix and a suffix (start value and final XORed value, respectively) you cannot just store the offset, unfortunately. On the plus side, this does mean that attackers cannot easily determine the offset with timing attacks either.

这篇关于Python 3.3 中的哈希函数在会话之间返回不同的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆