为什么在python 3中实例的__dict__这么小? [英] Why is the __dict__ of instances so much smaller in size in Python 3?
问题描述
在Python中,为类实例创建的字典与包含该类相同属性的字典相比很小:
import sys
class Foo(object):
def __init__(self, a, b):
self.a = a
self.b = b
f = Foo(20, 30)
使用Python 3.5.2时,对getsizeof
的以下调用产生:
>>> sys.getsizeof(vars(f)) # vars gets obj.__dict__
96
>>> sys.getsizeof(dict(vars(f))
288
288 - 96 = 192
字节已保存!
但是,另一方面,使用Python 2.7.12,则返回相同的调用:
>>> sys.getsizeof(vars(f))
280
>>> sys.getsizeof(dict(vars(f)))
280
已保存
0
个字节.
在这两种情况下,字典的内容显然都完全相同 :
>>> vars(f) == dict(vars(f))
True
所以这不是一个因素.另外,这也仅适用于Python 3.
那么,这是怎么回事?为什么实例__dict__
的大小在Python 3中如此之小?
简而言之:
实例__dict__
的实现与使用dict
或{}
创建的普通"字典的实现方式不同.实例的字典共享键和哈希,并为不同的部分(值)保留一个单独的数组. sys.getsizeof
仅在计算实例字典的大小时才计算这些值.
更多:
从Python 3.3开始,CPython中的词典以以下两种形式之一实现:
- 组合字典:字典的所有值都与每个条目的键和哈希一起存储. (
me_value
结构成员).据我所知,这种形式用于使用dict
,{}
和模块名称空间创建的字典. - 分割表:这些值分别存储在数组中,同时共享键和哈希( PEP 412-密钥共享字典中进行了描述.拆分字典的实现位于Python
3.3
中,因此,3
系列的早期版本以及Python2.x
都没有此实现.__sizeof__
的字典实现考虑到这一事实,并且在计算拆分字典的大小时仅考虑与values数组相对应的大小.值得庆幸的是,不言而喻:
Py_ssize_t size, res; size = DK_SIZE(mp->ma_keys); res = _PyObject_SIZE(Py_TYPE(mp)); if (mp->ma_values) /*Add the values to the result*/ res += size * sizeof(PyObject*); /* If the dictionary is split, the keys portion is accounted-for in the type object. */ if (mp->ma_keys->dk_refcnt == 1) /* Add keys/hashes size to res */ res += sizeof(PyDictKeysObject) + (size-1) * sizeof(PyDictKeyEntry); return res;
据我所知,拆分表字典是使用
dict()
或{}
(同样在PEP中描述的)仅为实例的名称空间创建的 总是导致没有这些优点的组合字典.
顺便说一句,因为它很有趣,所以我们总是可以打破这种优化.我目前发现了两种当前的方法,一种愚蠢的方法,或者是一种更为明智的情况:
-
傻瓜:
>>> f = Foo(20, 30) >>> getsizeof(vars(f)) 96 >>> vars(f).update({1:1}) # add a non-string key >>> getsizeof(vars(f)) 288
拆分表仅支持字符串键,添加非字符串键(确实使零有意义)违反了该规则,CPython将拆分表变成一个组合,从而失去了所有内存.
-
可能发生的情况:
>>> f1, f2 = Foo(20, 30), Foo(30, 40) >>> for i, j in enumerate([f1, f2]): ... setattr(j, 'i'+str(i), i) ... print(getsizeof(vars(j))) 96 288
在类的实例中插入不同的键最终将导致拆分表被合并.这并不仅仅适用于已经创建的实例.从该类创建的所有 conequent 实例都将具有组合字典,而不是拆分字典.
# after running previous snippet >>> getsizeof(vars(Foo(100, 200))) 288
当然,除了娱乐之外,没有其他理由是故意这样做的.
如果有人想知道,Python 3.6的字典实现不会改变这一事实.上面提到的两种形式的字典仍然可以使用(只是进一步压缩了(
dict.__sizeof__
的实现也进行了更改,因此从getsizeof
返回的值应该有一些区别.)In Python, dictionaries created for the instances of a class are tiny compared to the dictionaries created containing the same attributes of that class:
import sys class Foo(object): def __init__(self, a, b): self.a = a self.b = b f = Foo(20, 30)
When using Python 3.5.2, the following calls to
getsizeof
produce:>>> sys.getsizeof(vars(f)) # vars gets obj.__dict__ 96 >>> sys.getsizeof(dict(vars(f)) 288
288 - 96 = 192
bytes saved!Using Python 2.7.12, though, on the other hand, the same calls return:
>>> sys.getsizeof(vars(f)) 280 >>> sys.getsizeof(dict(vars(f))) 280
0
bytes saved.In both cases, the dictionaries obviously have exactly the same contents:
>>> vars(f) == dict(vars(f)) True
so this isn't a factor. Also, this also applies to Python 3 only.
So, what's going on here? Why is the size of the
__dict__
of an instance so tiny in Python 3?解决方案In short:
Instance
__dict__
's are implemented differently than the 'normal' dictionaries created withdict
or{}
. The dictionaries of an instance share the keys and hashes and the keep a separate array for the parts that differ: the values.sys.getsizeof
only counts those values when calculating the size for the instance dict.A bit more:
Dictionaries in CPython are, as of Python 3.3, implemented in one of two forms:
- Combined dictionary: All values of the dictionary are stored alongside the key and hash for each entry. (
me_value
member of thePyDictKeyEntry
struct). As far as I know, this form is used for dictionaries created withdict
,{}
and the module namespace. - Split table: The values are stored separately in an array, while the keys and hashes are shared (Values stored in
ma_values
ofPyDictObject
)
Instance dictionaries are always implemented in a split-table form (a Key-Sharing Dictionary) which allows instances of a given class to share the keys (and hashes) for their
__dict__
and only differ in the corresponding values.This is all described in PEP 412 -- Key-Sharing Dictionary. The implementation for the split dictionary landed in Python
3.3
so, previous versions of the3
family as well as Python2.x
don't have this implementation.The implementation of
__sizeof__
for dictionaries takes this fact into account and only considers the size that corresponds to the values array when calculating the size for a split dictionary.It's thankfully, self-explanatory:
Py_ssize_t size, res; size = DK_SIZE(mp->ma_keys); res = _PyObject_SIZE(Py_TYPE(mp)); if (mp->ma_values) /*Add the values to the result*/ res += size * sizeof(PyObject*); /* If the dictionary is split, the keys portion is accounted-for in the type object. */ if (mp->ma_keys->dk_refcnt == 1) /* Add keys/hashes size to res */ res += sizeof(PyDictKeysObject) + (size-1) * sizeof(PyDictKeyEntry); return res;
As far as I know, split-table dictionaries are created only for the namespace of instances, using
dict()
or{}
(as also described in the PEP) always results in a combined dictionary that doesn't have these benefits.
As an aside, since it's fun, we can always break this optimization. There's two current ways I've currently found, a silly way or by a more sensible scenario:
Being silly:
>>> f = Foo(20, 30) >>> getsizeof(vars(f)) 96 >>> vars(f).update({1:1}) # add a non-string key >>> getsizeof(vars(f)) 288
Split tables only support string keys, adding a non-string key (which really makes zero sense) breaks this rule and CPython turns the split table into a combined one loosing all memory gains.
A scenario that might happen:
>>> f1, f2 = Foo(20, 30), Foo(30, 40) >>> for i, j in enumerate([f1, f2]): ... setattr(j, 'i'+str(i), i) ... print(getsizeof(vars(j))) 96 288
Different keys being inserted in the instances of a class will eventually lead to the split table getting combined. This doesn't apply only to the instances already created; all consequent instances created from the class will be have a combined dictionary instead of a split one.
# after running previous snippet >>> getsizeof(vars(Foo(100, 200))) 288
of course, there's no good reason, other than for fun, for doing this on purpose.
If anyone is wondering, Python 3.6's dictionary implementation doesn't change this fact. The two aforementioned forms of dictionaries while still available are just further compacted (the implementation of
dict.__sizeof__
also changed, so some differences should come up in values returned fromgetsizeof
.)这篇关于为什么在python 3中实例的__dict__这么小?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
-