为什么字典排序非确定性? [英] Why is dictionary ordering non-deterministic?

查看:141
本文介绍了为什么字典排序非确定性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近从Python 2.7切换到Python 3.3,似乎在Python 2中,字典键的顺序是任意的但是一致的,在Python 3中,例如用例子获得的字典的键的顺序。 vars() 出现非确定性。



如果我运行:

  class Test(object):pass 
parameters = vars(Test)
print(list(parameters.keys()))

在Python 2.7和Python 3.3中,

然后:




  •   ['__ dict__','__module__','__weakref__','__doc__'] 


  • 使用Python 3.3,我可以得到任何随机的顺序 - 例如:




    ['__doc__','__dict__','__qualname__',$ _ code> ['__ weakref__','__module__','__qualname__','__doc__' '__module__','__weakref__']
    ['__dict__','__module__','__qualname__','__weakref__','__doc__']
    ['__weakref__','__doc__','__qualname__' '__dict__', '__module__']




决定论来自?为什么这样的东西就像

  list({str(i):i for range in range(10)}。 )

...运行之间一致,总是给

  ['3','2','1','0','7','6','5','4' '] 

...?

解决方案




更新:在Python 3.6中, dict 具有新实现,它保留插入顺序。但是,这是一个实现细节,不应该依赖。






这是一个安全修复从2012年起,这是默认启用(向下滚动到安全性改进)。



从公告中:


哈希随机化导致dicts和set的迭代顺序为
不可预测,并且跨Python运行不同。 Python从来没有保证在一个dict或set中的
重复次序的密钥,建议应用程序不要
依赖它。历史上,dict迭代顺序在
版本之间并没有多少变化,并且在连续执行
Python之前一直保持一致。因此,一些现有的应用程序可能依赖于dict或设置顺序。
由于这个原因,许多不接受
不受信任输入的Python应用程序不容易受到这种攻击,在这里提到的所有稳定的Python版本
中,HASH RANDOMIZATION被DEFAULT禁用


如上所述,Python 3.3中最后一个大写的位不再正确。



另请参阅: 对象.__散列__()文档(注侧栏)



如果绝对必要的话可以通过将 PYTHONHASHSEED 环境变量为 0






您的反例:

  list({str(i))i for i in范围(10) } .keys())

... 其实总是给相同的结果在Python 3.3中,尽管不同排序的数量有限由于处理哈希冲突的方式:

  $ for {0..999} 
>做
> python3.3 -cprint(list({str(i):i for i in range(10)}。keys()))
>完成|排序| uniq -c
61 ['0','1','2','3','4','5','6','7','8' b $ b 73 ['1','0','3','2','5','4','7','6','9','8' ['2','3','0','1','6','7','4','5','8','9']
59 ['3' ,'2','1','0','7','6','5','4','9','8']
58 ['4','5' ,'6','7','0','1','2','3','8','9']
55 ['5','4' ,'6','1','0','3','2','9','8']
62 ['6' ,'2','3','0','1','8','9']
63 ['7','6','5' ,'2','1','0','9','8']
60 ['8','9','0','1' ,'4','5','6','7']
66 ['8','9','2','3' ,'1','6','7','4','5']
65 ['8','9','4','5' ,'0','1','2','3']
53 ['8','9','6' ,'3','0','1']
62 ['9','8','1','0' ,'7','6']
52 ['9','8','3','2','1','0' ,'4']
73 ['9','8','5','4','7','6','1','0','3' ]
76 ['9','8','7','6','5','4','3','2','1','0' b

如答案开头所指出的那样,Python 3.6中不再如此:

  $ for {0..999} 
>做
> python3.6 -cprint(list({str(i):i for i in range(10)}。keys()))
>完成|排序| uniq -c
1000 ['0','1','2','3','4','5','6','7','8' b $ b


I recently switched from Python 2.7 to Python 3.3, and it seems that while in Python 2 the ordering of dictionary keys was arbitrary but consistent, in Python 3 the ordering of the keys of a dictionary obtained with e.g. vars() appears non-deterministic.

If I run:

class Test(object): pass
parameters = vars(Test)
print(list(parameters.keys()))

in both Python 2.7 and Python 3.3, then:

  • Python 2.7 consistently gives me

    ['__dict__', '__module__', '__weakref__', '__doc__']
    

  • With Python 3.3, I can get any random order – for example:

    ['__weakref__', '__module__', '__qualname__', '__doc__', '__dict__']
    ['__doc__', '__dict__', '__qualname__', '__module__', '__weakref__']
    ['__dict__', '__module__', '__qualname__', '__weakref__', '__doc__']
    ['__weakref__', '__doc__', '__qualname__', '__dict__', '__module__']
    

Where does this non-determinism come from? And why is something like

list({str(i): i for i in range(10)}.keys())

… consistent between runs, always giving

['3', '2', '1', '0', '7', '6', '5', '4', '9', '8']

… ?

解决方案


Update: In Python 3.6, dict has a new implementation which preserves insertion order. However, this is an implementation detail, and should not be relied on.


This is the result of a security fix from 2012, which was enabled by default in Python 3.3 (scroll down to "Security improvements").

From the announcement:

Hash randomization causes the iteration order of dicts and sets to be unpredictable and differ across Python runs. Python has never guaranteed iteration order of keys in a dict or set, and applications are advised to never rely on it. Historically, dict iteration order has not changed very often across releases and has always remained consistent between successive executions of Python. Thus, some existing applications may be relying on dict or set ordering. Because of this and the fact that many Python applications which don't accept untrusted input are not vulnerable to this attack, in all stable Python releases mentioned here, HASH RANDOMIZATION IS DISABLED BY DEFAULT.

As noted above, the last, capitalized bit is no longer true in Python 3.3.

See also: object.__hash__() documentation ("Note" sidebar).

If absolutely necessary, you can disable hash randomization in versions of Python affected by this behaviour by setting the PYTHONHASHSEED environment variable to 0.


Your counterexample:

list({str(i): i for i in range(10)}.keys())

… does not in fact always give the same result in Python 3.3, although the number of different orderings is limited due to the way hash collisions are handled:

$ for x in {0..999}
> do
>   python3.3 -c "print(list({str(i): i for i in range(10)}.keys()))"
> done | sort | uniq -c
     61 ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
     73 ['1', '0', '3', '2', '5', '4', '7', '6', '9', '8']
     62 ['2', '3', '0', '1', '6', '7', '4', '5', '8', '9']
     59 ['3', '2', '1', '0', '7', '6', '5', '4', '9', '8']
     58 ['4', '5', '6', '7', '0', '1', '2', '3', '8', '9']
     55 ['5', '4', '7', '6', '1', '0', '3', '2', '9', '8']
     62 ['6', '7', '4', '5', '2', '3', '0', '1', '8', '9']
     63 ['7', '6', '5', '4', '3', '2', '1', '0', '9', '8']
     60 ['8', '9', '0', '1', '2', '3', '4', '5', '6', '7']
     66 ['8', '9', '2', '3', '0', '1', '6', '7', '4', '5']
     65 ['8', '9', '4', '5', '6', '7', '0', '1', '2', '3']
     53 ['8', '9', '6', '7', '4', '5', '2', '3', '0', '1']
     62 ['9', '8', '1', '0', '3', '2', '5', '4', '7', '6']
     52 ['9', '8', '3', '2', '1', '0', '7', '6', '5', '4']
     73 ['9', '8', '5', '4', '7', '6', '1', '0', '3', '2']
     76 ['9', '8', '7', '6', '5', '4', '3', '2', '1', '0']

As noted at the beginning of this answer, that's no longer the case in Python 3.6:

$ for x in {0..999}
> do
>   python3.6 -c "print(list({str(i): i for i in range(10)}.keys()))"
> done | sort | uniq -c
   1000 ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

这篇关于为什么字典排序非确定性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆