在Python 3.6+中是否订购了字典? [英] Are dictionaries ordered in Python 3.6+?

查看:78
本文介绍了在Python 3.6+中是否订购了字典?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

与以前的版本不同,字典在Python 3.6中排序(至少在CPython实现下).这似乎是一个实质性的更改,但这只是文档.它被描述为CPython实现细节而不是语言功能,但这也意味着将来可能会成为标准.

Dictionaries are ordered in Python 3.6 (under the CPython implementation at least) unlike in previous incarnations. This seems like a substantial change, but it's only a short paragraph in the documentation. It is described as a CPython implementation detail rather than a language feature, but also implies this may become standard in the future.

在保留元素顺序的同时,新的字典实现与旧的实现相比如何有更好的表现?

How does the new dictionary implementation perform better than the older one while preserving element order?

以下是文档中的文字:

dict()现在使用紧凑"表示形式 PEP 468 (在函数中保留** kwargs的顺序.)这样.此新实现的顺序保留方面被认为是实现细节,因此不应依赖(将来可能会更改,但是希望在更改语言规范之前,先在几个发行版中使用该新dict实现该语言,为所有当前和将来的Python实现强制要求保留顺序的语义;这还有助于保留与仍旧有效的随机迭代顺序的旧版本语言(例如Python 3.5)的向后兼容性. (由INADA Naoki在 issue 27350 中提供.想法

dict() now uses a "compact" representation pioneered by PyPy. The memory usage of the new dict() is between 20% and 25% smaller compared to Python 3.5. PEP 468 (Preserving the order of **kwargs in a function.) is implemented by this. The order-preserving aspect of this new implementation is considered an implementation detail and should not be relied upon (this may change in the future, but it is desired to have this new dict implementation in the language for a few releases before changing the language spec to mandate order-preserving semantics for all current and future Python implementations; this also helps preserve backwards-compatibility with older versions of the language where random iteration order is still in effect, e.g. Python 3.5). (Contributed by INADA Naoki in issue 27350. Idea originally suggested by Raymond Hettinger.)

2017年12月更新:dict的保留插入顺序为保证(适用于Python 3.7

Update December 2017: dicts retaining insertion order is guaranteed for Python 3.7

推荐答案

是否在Python 3.6+中订购了字典?

它们是插入顺序 [1] .从Python 3.6开始,对于Python的CPython实现,词典记住插入项目的顺序. 这被视为Python 3.6中的实现细节;如果要在其他Python实现中实现保证的插入顺序(以及其他有序的行为 [1] ),则需要使用OrderedDict.

They are insertion ordered[1]. As of Python 3.6, for the CPython implementation of Python, dictionaries remember the order of items inserted. This is considered an implementation detail in Python 3.6; you need to use OrderedDict if you want insertion ordering that's guaranteed across other implementations of Python (and other ordered behavior[1]).

从Python 3.7起,这不再是实现细节,而是成为一种语言功能. 来自GvR的py​​thon-dev消息:

As of Python 3.7, this is no longer an implementation detail and instead becomes a language feature. From a python-dev message by GvR:

做到这一点.裁定为辞典保持插入顺序".谢谢!

Make it so. "Dict keeps insertion order" is the ruling. Thanks!

这只是意味着您可以依靠它.如果其他Python实现希望成为Python 3.7的一致实现,则还必须提供插入顺序字典.

This simply means that you can depend on it. Other implementations of Python must also offer an insertion ordered dictionary if they wish to be a conforming implementation of Python 3.7.

在保留元素顺序的同时,Python 3.6词典实现与旧版本相比如何表现出更好的表现? [2] ?

How does the Python 3.6 dictionary implementation perform better[2] than the older one while preserving element order?

本质上,通过保留两个数组.

  • The first array, dk_entries, holds the entries (of type PyDictKeyEntry) for the dictionary in the order that they were inserted. Preserving order is achieved by this being an append only array where new items are always inserted at the end (insertion order).

第二个 dk_indices 保留dk_entries数组的索引(即,指示相应条目在dk_entries中的位置的值).该数组充当哈希表.散列键时,它会导致存储在dk_indices中的索引之一,并且通过索引dk_entries来获取相应的条目.由于仅保留索引,因此此数组的类型取决于字典的整体大小(范围为 int64_t (32/64位生成的4/8字节)

The second, dk_indices, holds the indices for the dk_entries array (that is, values that indicate the position of the corresponding entry in dk_entries). This array acts as the hash table. When a key is hashed it leads to one of the indices stored in dk_indices and the corresponding entry is fetched by indexing dk_entries. Since only indices are kept, the type of this array depends on the overall size of the dictionary (ranging from type int8_t(1 byte) to int32_t/int64_t (4/8 bytes) on 32/64 bit builds)

在先前的实现中,必须分配类型为PyDictKeyEntry且大小为dk_size的稀疏数组;不幸的是,这也导致了很多空白空间,因为该数组不允许超过2/3 * dk_size完整

In the previous implementation, a sparse array of type PyDictKeyEntry and size dk_size had to be allocated; unfortunately, it also resulted in a lot of empty space since that array was not allowed to be more than 2/3 * dk_size full for performance reasons. (and the empty space still had PyDictKeyEntry size!).

现在不是这种情况,因为仅存储了必需的条目(已插入的条目)和类型为intX_t(取决于字典大小的X)稀疏数组已满.空格从类型PyDictKeyEntry更改为intX_t.

This is not the case now since only the required entries are stored (those that have been inserted) and a sparse array of type intX_t (X depending on dict size) 2/3 * dk_sizes full is kept. The empty space changed from type PyDictKeyEntry to intX_t.

因此,很明显,创建类型为PyDictKeyEntry的稀疏数组比存储int的稀疏数组需要更多的内存.

So, obviously, creating a sparse array of type PyDictKeyEntry is much more memory demanding than a sparse array for storing ints.

您可以在Python-Dev上看到完整的对话 有关此功能(如果有兴趣的话),很好阅读.

You can see the full conversation on Python-Dev regarding this feature if interested, it is a good read.

在Raymond Hettinger提出的原始提案中 ,可以看到所使用的数据结构的可视化效果,从而捕捉到了构想的精髓.

In the original proposal made by Raymond Hettinger, a visualization of the data structures used can be seen which captures the gist of the idea.

例如,字典:

For example, the dictionary:

d = {'timmy': 'red', 'barry': 'green', 'guido': 'blue'}

当前存储为[keyhash,key,value]:

is currently stored as [keyhash, key, value]:

entries = [['--', '--', '--'],
           [-8522787127447073495, 'barry', 'green'],
           ['--', '--', '--'],
           ['--', '--', '--'],
           ['--', '--', '--'],
           [-9092791511155847987, 'timmy', 'red'],
           ['--', '--', '--'],
           [-6480567542315338377, 'guido', 'blue']]

相反,数据应按以下方式组织:

Instead, the data should be organized as follows:

indices =  [None, 1, None, None, None, 0, None, 2]
entries =  [[-9092791511155847987, 'timmy', 'red'],
            [-8522787127447073495, 'barry', 'green'],
            [-6480567542315338377, 'guido', 'blue']]

正如您现在可以直观地看到的那样,在最初的建议中,实际上有很多空间是空的,以减少冲突并加快查找速度.使用新方法,您可以通过将稀疏真正地移动到索引中来减少所需的内存.

As you can visually now see, in the original proposal, a lot of space is essentially empty to reduce collisions and make look-ups faster. With the new approach, you reduce the memory required by moving the sparseness where it's really required, in the indices.

[1]:我说插入排序"而不是排序",因为在存在OrderedDict的情况下,排序"暗示了dict对象不提供的其他行为. OrderedDicts是可逆的,提供顺序敏感的方法,并且主要提供顺序敏感的相等性测试(==!=). dict当前不提供任何这些行为/方法.

[1]: I say "insertion ordered" and not "ordered" since, with the existence of OrderedDict, "ordered" suggests further behavior that the dict object doesn't provide. OrderedDicts are reversible, provide order sensitive methods and, mainly, provide an order-sensive equality tests (==, !=). dicts currently don't offer any of those behaviors/methods.

[2]:通过更加紧凑的设计,新的字典实现在明智的存储方式上表现更好.这是这里的主要好处.在速度方面,差异并不那么明显,在某些地方,新dict可能会引入轻微的回归(键查找,用于例如),而在其他情况(想到迭代和调整大小)的情况下,则应该提高性能.

[2]: The new dictionary implementations performs better memory wise by being designed more compactly; that's the main benefit here. Speed wise, the difference isn't so drastic, there's places where the new dict might introduce slight regressions (key-lookups, for example) while in others (iteration and resizing come to mind) a performance boost should be present.

总体而言,由于引入了紧凑性,字典的性能(尤其是在现实生活中)得以提高.

Overall, the performance of the dictionary, especially in real-life situations, improves due to the compactness introduced.

这篇关于在Python 3.6+中是否订购了字典?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆