为什么字典中的顺序和任意设置? [英] Why is the order in dictionaries and sets arbitrary?

查看:143
本文介绍了为什么字典中的顺序和任意设置?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不明白如何循环使用字典或在python中设置是通过任意的顺序完成的。

I don't understand how looping over a dictionary or set in python is done by 'arbitrary' order.

我的意思是,这是一种编程语言,语言必须100%确定,正确吗? Python必须有某种算法来决定哪一部分字典或集合被选中,第一,第二等等。

I mean, it's a programming language so everything in the language must be 100% determined, correct? Python must have some kind of algorithm that decides which part of the dictionary or set is chosen, 1st, second and so on.

我缺少什么?

推荐答案

订单不是任意的,但取决于字典或集合的插入和删除历史,以及具体的Python实现。对于这个答案的其余部分,对于'字典',你也可以读'set';集合被实现为具有键和无值的字典。

The order is not arbitrary, but depends on the insertion and deletion history of the dictionary or set, as well as on the specific Python implementation. For the remainder of this answer, for 'dictionary', you can also read 'set'; sets are implemented as dictionaries with just keys and no values.

键被散列,并且哈希值被分配给动态表中的插槽(它可以基于需要)。而且映射过程可能会导致冲突,这意味着一个密钥必须根据已经存在的密钥在下一个插槽中。

Keys are hashed, and hash values are assigned to slots in a dynamic table (it can grow or shrink based on needs). And that mapping process can lead to collisions, meaning that a key will have to be slotted in a next slot based on what is already there.

在插槽上列出内容循环,所以按照他们 的顺序列在表格中。

Listing the contents loops over the slots, and so keys are listed in the order they currently reside in the table.

拿钥匙<例如,code>'foo'和'bar',让我们假设表格大小是8个插槽。在Python 2.7中, hash('foo') -4177197833195190597 hash('bar ') 327024216814240868 。模8,这意味着这两个键在插槽3和4中插槽,然后:

Take the keys 'foo' and 'bar', for example, and lets assume the table size is 8 slots. In Python 2.7, hash('foo') is -4177197833195190597, hash('bar') is 327024216814240868. Modulo 8, that means these two keys are slotted in slots 3 and 4 then:

>>> hash('foo')
-4177197833195190597
>>> hash('foo') % 8
3
>>> hash('bar')
327024216814240868
>>> hash('bar') % 8
4

这将通知他们的列表顺序: p>

This informs their listing order:

>>> {'bar': None, 'foo': None}
{'foo': None, 'bar': None}

除3和4之外的所有插槽为空,循环播放表首先列出插槽3,然后插槽4,因此'foo'已列出之前'bar'

All slots except 3 and 4 are empty, looping over the table first lists slot 3, then slot 4, so 'foo' is listed before 'bar'.

bar baz ,但是,具有完全相隔8的哈希值,因此映射到完全相同的插槽, 4

bar and baz, however, have hash values that are exactly 8 apart and thus map to the exact same slot, 4:

>>> hash('bar')
327024216814240868
>>> hash('baz')
327024216814240876
>>> hash('bar') % 8
4
>>> hash('baz') % 8
4

他们的订单现在取决于哪个键是开槽先第二个键将被移动到下一个插槽:

Their order now depends on which key was slotted first; the second key will have to be moved to a next slot:

>>> {'baz': None, 'bar': None}
{'bar': None, 'baz': None}
>>> {'bar': None, 'baz': None}
{'baz': None, 'bar': None}

表格顺序在这里不一样,因为一个或另一个键先被插槽。

The table order differs here, because one or the other key was slotted first.

CPython使用的底层结构的技术名称(最常用的Python实现)是一个哈希表,它使用开放式寻址。如果您好奇,并且C足够了解,请查看 C所有(详细记录)的细节的实施。您还可以观看由Brandon Rhodes 发布的Pycon 2010演示文稿,了解如何使用CPython dict 的作品,或者拿起一份美丽的代码,其中包括由Andrew Kuchling编写的一章。

The technical name for the underlying structure used by CPython (the most commonly used Python implemenation) is a hash table, one that uses open addressing. If you are curious, and understand C well enough, take a look at the C implementation for all the (well documented) details. You could also watch this Pycon 2010 presentation by Brandon Rhodes about how CPython dict works, or pick up a copy of Beautiful Code, which includes a chapter on the implementation written by Andrew Kuchling.

请注意,从Python 3.3开始,还使用随机哈希种子,使哈希碰撞不可预测,以防止某些类型的拒绝服务(攻击者通过引起大量哈希冲突而使Python服务器无响应)。这意味着依赖于当前Python调用的随机哈希种子,给定字典的顺序是

Note that as of Python 3.3, a random hash seed is used as well, making hash collisions unpredictable to prevent certain types of denial of service (where an attacker renders a Python server unresponsive by causing mass hash collisions). This means that the order of a given dictionary is then also dependent on the random hash seed for the current Python invocation.

其他实现是免费的对于字典使用不同的结构,只要它们满足他们记录的Python界面,但我相信所有的实现都是使用哈希表的变体。

Other implementations are free to use a different structure for dictionaries, as long as they satisfy the documented Python interface for them, but I believe that all implementations so far use a variation of the hash table.

CPython 3.6引入了维护插入顺序的新型 dict 实现,并且启动速度更快,内存更高。而不是保留一个大的稀疏表,其中每行引用存储的哈希值,以及键和值对象,新的实现将添加一个较小的哈希数组,它只引用密集表中的索引(仅包含与实际键值对一样多的行),并且它是按顺序列出包含的项目的密集表。有关详细信息,请参阅 Python-Dev提案。请注意,这被认为是实现细节,Python语言没有指定其他实现必须保留顺序。

CPython 3.6 introduces a new dict implementation that maintains insertion order, and is faster and more memory efficient to boot. Rather than keep a large sparse table where each row references the stored hash value, and the key and value objects, the new implementation adds a smaller hash array that only references indices in dense table (one that only contains as many rows as there are actual key-value pairs), and it is the dense table that happens to list the contained items in order. See the proposal to Python-Dev for more details. Note that this is considered an implementation detail, Python-the-language does not specify that other implementations have to retain order.

Python 2.7和较新的还提供了一个 OrderedDict class ,一个 dict 的子类,添加了一个附加数据结构来记录关键顺序。以一些速度和额外的记忆的价格,这个类别记住你以什么顺序插入键;列出密钥,值或项目将按此顺序执行。它使用存储在附加字典中的双向链表来保持订单的有效性。请参阅Raymond Hettinger发布的概述。请注意,设置类型仍然无序。

Python 2.7 and newer also provides an OrderedDict class, a subclass of dict that adds an additional data structure to record key order. At the price of some speed and extra memory, this class remembers in what order you inserted keys; listing keys, values or items will then do so in that order. It uses a doubly-linked list stored in an additional dictionary to keep the order up-to-date efficiently. See the post by Raymond Hettinger outlining the idea. Note that the set type is still unordered.

如果您想要一个有序集,您可以安装 oset package ;它适用于Python 2.5及更高版本。

If you wanted an ordered set, you can install the oset package; it works on Python 2.5 and up.

这篇关于为什么字典中的顺序和任意设置?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆