为什么__dict__的实例在Python 3中如此之小? [英] Why is the __dict__ of instances so small in Python 3?
问题描述
在Python中,为类的实例创建的字典与创建的包含该类相同属性的字典相比是微不足道的:
import sys
class Foo(object):
def __init __(self,a,b):
self.a = a
self.b = b
f = Foo(20,30)
使用Python 3.5时。 2,以下调用 getsizeof
产生:
>> > sys.getsizeof(vars(f))#vars get obj .__ dict__
96
>>> sys.getsizeof(dict(vars(f))
288
288 - 96 = 192
bytes saved!
然而,另一方面,使用Python 2.7.12 ,相同的调用返回:
>>> sys.getsizeof(vars(f))
280
>>> sys.getsizeof(dict(vars(f)))
280
0
保存字节
在这两种情况下,字典显然具有完全相同的内容:
>>> vars(f)= = dict(vars(f))
True
所以这不是一个因素。此外,这也适用于Python 3。
那么,这里发生了什么?为什么 __ dict __
一个在Python 3中如此微小的实例?
简而言之
实例 __ dict __
的实现方式与正常字典es使用 dict
或 {}
创建。实例的字典共享
键和散列,并为不同的部分保留一个单独的数组:值。 sys.getsizeof
仅在计算实例dict的大小时计算这些值。
:
__ dict __
的实现方式与正常字典es使用 dict
或 {}
创建。实例的字典共享从Python 3.3开始,CPython中的词典以两种形式之一实现:
- 组合字典:字典的所有值都存储在每个条目的键和散列之外。 (
me_value
成员的PyDictKeyEntry
struct )。据我所知,此表单用于使用dict
,{}
和模块命名空间创建的字典。 / li>
- 拆分表:这些值分别存储在数组中,而键和哈希共享(存储在
ma_values中的值
PyDictObject
)
实例字典始终是 >以分表格形式(密钥共享字典)实现,它允许给定类的实例为其 __ dict __
共享密钥(和散列),并且仅在相应的值。
这一切都在 PEP 412 - 密钥共享字典。分裂字典的实现落在Python 3.3
中,所以以前版本的 3
family以及Python 2.x
没有这个实现。
字典的 __ sizeof __
的实现将这一事实考虑在内在计算分割字典的大小时,只考虑与values数组对应的大小。
非常感谢,不言自明:
Py_ssize_t size,res;
size = DK_SIZE(mp-> ma_keys);
res = _PyObject_SIZE(Py_TYPE(mp));
if(mp-> ma_values)/ *将值添加到结果* /
res + = size * sizeof(PyObject *);
/ *如果字典被拆分,则键部分将在类型对象中被计入
。 * /
if(mp-> ma_keys-> dk_refcnt == 1)/ *将键/哈希大小添加到res * /
res + = sizeof(PyDictKeysObject)+(size-1)*的sizeof(PyDictKeyEntry);
return res;
据我所知,拆分表字典仅为实例的命名空间创建使用 dict()
或 {}
(也如PEP所述)
除此之外,因为它很有趣我们可以随时打破这个优化。目前有两种方法,一种愚蠢的方式或一个更明智的情况:
-
愚蠢: p>
>>> f = Foo(20,30)
>>>> getsizeof(vars(f))
96
>>> vars(f).update({1:1})#添加非字符串键
>>> getsizeof(vars(f))
288
拆分表只支持字符串键,非串键(真正使零感觉)打破了这个规则,CPython将分裂表变成了一个失去所有内存增益的组合表。
-
可能发生的情况:
>>> f1,f2 = Foo(20,30),Foo(30,40)
>>>>对于i,枚举中的j([f1,f2]):
... setattr(j,'i'+ str(i),i)
... print(getsizeof(vars(j ))
96
288
不同的键插入到一个班级最终会导致分组合并。这仅适用于已创建的实例;从类创建的所有结果实例将具有组合字典而不是拆分。
#运行上一个代码段
>>>> getsizeof(vars(Foo(100,200)))
288
当然,除了有趣之外,没有什么好的理由,故意这样做。
如果有人在游荡,Python 3.6的字典实现不会改变这个事实。上述两种上述形式的字典仍然可用,只是进一步压缩(执行
dict .__ sizeof __
也改变了,所以从<$ c $返回的值会出现一些差异c> getsizeof 。) In Python, dictionaries created for the instances of a class are tiny compared to the dictionaries created containing the same attributes of that class:
import sys
class Foo(object):
def __init__(self, a, b):
self.a = a
self.b = b
f = Foo(20, 30)
When using Python 3.5.2, the following calls to getsizeof
produce:
>>> sys.getsizeof(vars(f)) # vars gets obj.__dict__
96
>>> sys.getsizeof(dict(vars(f))
288
288 - 96 = 192
bytes saved!
Using Python 2.7.12, though, on the other hand, the same calls return:
>>> sys.getsizeof(vars(f))
280
>>> sys.getsizeof(dict(vars(f)))
280
0
bytes saved.
In both cases, the dictionaries obviously have exactly the same contents:
>>> vars(f) == dict(vars(f))
True
so this isn't a factor. Also, this also applies to Python 3 only.
So, what's going on here? Why is the size of the __dict__
of an instance so tiny in Python 3?
In short:
Instance __dict__
's are implemented differently than the 'normal' dictionaries created with dict
or {}
. The dictionaries of an instance share the keys and hashes and the keep a separate array for the parts that differ: the values. sys.getsizeof
only counts those values when calculating the size for the instance dict.
A bit more:
Dictionaries in CPython are, as of Python 3.3, implemented in one of two forms:
- Combined dictionary: All values of the dictionary are stored alongside the key and hash for each entry. (
me_value
member of thePyDictKeyEntry
struct). As far as I know, this form is used for dictionaries created withdict
,{}
and the module namespace. - Split table: The values are stored separately in an array, while the keys and hashes are shared (Values stored in
ma_values
ofPyDictObject
)
Instance dictionaries are always implemented in a split-table form (a Key-Sharing Dictionary) which allows instances of a given class to share the keys (and hashes) for their __dict__
and only differ in the corresponding values.
This is all described in PEP 412 -- Key-Sharing Dictionary. The implementation for the split dictionary landed in Python 3.3
so, previous versions of the 3
family as well as Python 2.x
don't have this implementation.
The implementation of __sizeof__
for dictionaries takes this fact into account and only considers the size that corresponds to the values array when calculating the size for a split dictionary.
It's thankfully, self-explanatory:
Py_ssize_t size, res;
size = DK_SIZE(mp->ma_keys);
res = _PyObject_SIZE(Py_TYPE(mp));
if (mp->ma_values) /*Add the values to the result*/
res += size * sizeof(PyObject*);
/* If the dictionary is split, the keys portion is accounted-for
in the type object. */
if (mp->ma_keys->dk_refcnt == 1) /* Add keys/hashes size to res */
res += sizeof(PyDictKeysObject) + (size-1) * sizeof(PyDictKeyEntry);
return res;
As far as I know, split-table dictionaries are created only for the namespace of instances, using dict()
or {}
(as also described in the PEP) always results in a combined dictionary that doesn't have these benefits.
As an aside, since it's fun, we can always break this optimization. There's two current ways I've currently found, a silly way or by a more sensible scenario:
Being silly:
>>> f = Foo(20, 30) >>> getsizeof(vars(f)) 96 >>> vars(f).update({1:1}) # add a non-string key >>> getsizeof(vars(f)) 288
Split tables only support string keys, adding a non-string key (which really makes zero sense) breaks this rule and CPython turns the split table into a combined one loosing all memory gains.
A scenario that might happen:
>>> f1, f2 = Foo(20, 30), Foo(30, 40) >>> for i, j in enumerate([f1, f2]): ... setattr(j, 'i'+str(i), i) ... print(getsizeof(vars(j))) 96 288
Different keys being inserted in the instances of a class will eventually lead to the split table getting combined. This doesn't apply only to the instances already created; all consequent instances created from the class will be have a combined dictionary instead of a split one.
# after running previous snippet >>> getsizeof(vars(Foo(100, 200))) 288
of course, there's no good reason, other than for fun, for doing this on purpose.
If anyone is wandering, Python 3.6's dictionary implementation doesn't change this fact. The two aforementioned forms of dictionaries while still available are just further compacted (the implementation of dict.__sizeof__
also changed, so some differences should come up in values returned from getsizeof
.)
这篇关于为什么__dict__的实例在Python 3中如此之小?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!