无序 Python 集的“顺序" [英] 'order' of unordered Python sets

查看:37
本文介绍了无序 Python 集的“顺序"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道 Python 中的集合是无序的,但我很好奇它们显示的顺序",因为它似乎是一致的.他们似乎每次都以同样的方式乱序:

<预><代码>>>>set_1 = set([5, 2, 7, 2, 1, 88])>>>set_2 = set([5, 2, 7, 2, 1, 88])>>>设置_1设置([88, 1, 2, 5, 7])>>>设置_2设置([88, 1, 2, 5, 7])

...还有另一个例子:

<预><代码>>>>set_3 = set('abracadabra')>>>set_4 = set('abracadabra')>>>set_3设置(['a','r','b','c','d'])>>>>设置_4设置(['a','r','b','c','d'])

我只是好奇为什么会这样.有什么帮助吗?

解决方案

您应该观看此视频 (虽然它是 CPython1 特定的和关于字典的——但我认为它也适用于集合).

基本上,python 对元素进行散列并取最后 N 位(其中 N 由集合的大小决定)并将这些位用作数组索引以将对象放入内存中.然后对象按照它们在内存中存在的顺序产生.当然,当您需要解决散列之间的冲突时,情况会变得更复杂一些,但这就是它的要点.

另请注意,它们的打印顺序由您放置它们的顺序决定(由于冲突).因此,如果您对传递给 set_2 的列表重新排序,如果出现键冲突,您可能会得到不同的顺序.

例如:

list1 = [8,16,24]设置(列表1)#设置([8, 16, 24])list2 = [24,16,8]set(list2) #set([24, 16, 8])

注意在这些集合中保留顺序的事实是巧合",并且与冲突解决有关(我对此一无所知).关键是hash(8)hash(16)hash(24)的最后3位是一样的.因为它们是相同的,冲突解决接管并将元素放在备份"内存位置而不是第一个(最佳)选择,因此 8 占用一个位置还是 16 由谁先到达派对并占据最佳座位"来决定.

如果我们用 123 重复这个例子,无论它们在输入列表:

list1 = [1,2,3]set(list1) # set([1, 2, 3])列表2 = [3,2,1]set(list2) # set([1, 2, 3])

由于hash(1)的最后3位,hash(2)hash(3)是唯一的.

<小时>

1注意 此处描述的实现适用于 CPython dictset.我认为一般描述适用于 CPython 3.6 以下的所有现代版本.但是,从 CPython3.6 开始,还有一个额外的实现细节,它实际上保留了 dict 迭代的插入顺序.看来 set 仍然没有这个属性.这篇博文 由 pypy 人员(他们在 CPython 人员之前开始使用它).最初的想法(至少对于 python 生态系统)存档在python-dev 邮件列表.

I understand that sets in Python are unordered, but I'm curious about the 'order' they're displayed in, as it seems to be consistent. They seem to be out-of-order in the same way every time:

>>> set_1 = set([5, 2, 7, 2, 1, 88])
>>> set_2 = set([5, 2, 7, 2, 1, 88])
>>> set_1
set([88, 1, 2, 5, 7])
>>> set_2
set([88, 1, 2, 5, 7])

...and another example:

>>> set_3 = set('abracadabra')
>>> set_4 = set('abracadabra')
>>> set_3
set(['a', 'r', 'b', 'c', 'd'])
>>>> set_4
set(['a', 'r', 'b', 'c', 'd'])

I'm just curious why this would be. Any help?

解决方案

You should watch this video (although it is CPython1 specific and about dictionaries -- but I assume it applies to sets as well).

Basically, python hashes the elements and takes the last N bits (where N is determined by the size of the set) and uses those bits as array indices to place the object in memory. The objects are then yielded in the order they exist in memory. Of course, the picture gets a little more complicated when you need to resolve collisions between hashes, but that's the gist of it.

Also note that the order that they are printed out is determined by the order that you put them in (due to collisions). So, if you reorder the list you pass to set_2, you might get a different order out if there are key collisions.

For example:

list1 = [8,16,24]
set(list1)        #set([8, 16, 24])
list2 = [24,16,8]
set(list2)        #set([24, 16, 8])

Note the fact that the order is preserved in these sets is "coincidence" and has to do with collision resolution (which I don't know anything about). The point is that the last 3 bits of hash(8), hash(16) and hash(24) are the same. Because they are the same, collision resolution takes over and puts the elements in "backup" memory locations instead of the first (best) choice and so whether 8 occupies a location or 16 is determined by which one arrived at the party first and took the "best seat".

If we repeat the example with 1, 2 and 3, you will get a consistent order no matter what order they have in the input list:

list1 = [1,2,3]
set(list1)      # set([1, 2, 3])
list2 = [3,2,1]
set(list2)      # set([1, 2, 3])

since the last 3 bits of hash(1), hash(2) and hash(3) are unique.


1Note The implementation described here applies to CPython dict and set. I think that the general description is valid for all modern versions of CPython up to 3.6. However, starting with CPython3.6, there is an additional implementation detail that actually preserves the insertion order for iteration for dict. It appears that set still do not have this property. The data structure is described by this blog post by the pypy folks (who started using this before the CPython folks). The original idea (at least for the python ecosystem) is archived on the python-dev mailing list.

这篇关于无序 Python 集的“顺序"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆