为什么__dict__的实例在Python 3中如此之小? [英] Why is the __dict__ of instances so small in Python 3?

查看:127
本文介绍了为什么__dict__的实例在Python 3中如此之小?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Python中,为类的实例创建的字典与创建的包含该类相同属性的字典相比是微不足道的:

  import sys 

class Foo(object):
def __init __(self,a,b):
self.a = a
self.b = b

f = Foo(20,30)

使用Python 3.5时。 2,以下调用 getsizeof 产生:

 >> > sys.getsizeof(vars(f))#vars get obj .__ dict__ 
96
>>> sys.getsizeof(dict(vars(f))
288

288 - 96 = 192 bytes saved!



然而,另一方面,使用Python 2.7.12 ,相同的调用返回:

 >>> sys.getsizeof(vars(f))
280
>>> sys.getsizeof(dict(vars(f)))
280

0 保存字节



在这两种情况下,字典显然具有完全相同的内容:

 >>> vars(f)= = dict(vars(f))
True

所以这不是一个因素。此外,这也适用于Python 3。



那么,这里发生了什么?为什么 __ dict __ 一个在Python 3中如此微小的实例?

解决方案

简而言之

实例 __ dict __ 的实现方式与正常字典es使用 dict {} 创建。实例的字典共享键和散列,并为不同的部分保留一个单独的数组:值。 sys.getsizeof 仅在计算实例dict的大小时计算这些值。





从Python 3.3开始,CPython中的词典以两种形式之一实现:





实例字典始终是 >以分表格形式(密钥共享字典)实现,它允许给定类的实例为其 __ dict __ 共享密钥(和散列),并且仅在相应的值。



这一切都在 PEP 412 - 密钥共享字典。分裂字典的实现落在Python 3.3 中,所以以前版本的 3 family以及Python 2.x 没有这个实现。



字典的 __ sizeof __ 的实现将这一事实考虑在内在计算分割字典的大小时,只考虑与values数组对应的大小。



非常感谢,不言自明:

  Py_ssize_t size,res; 

size = DK_SIZE(mp-> ma_keys);
res = _PyObject_SIZE(Py_TYPE(mp));
if(mp-> ma_values)/ *将值添加到结果* /
res + = size * sizeof(PyObject *);
/ *如果字典被拆分,则键部分将在类型对象中被计入
。 * /
if(mp-> ma_keys-> dk_refcnt == 1)/ *将键/哈希大小添加到res * /
res + = sizeof(PyDictKeysObject)+(size-1)*的sizeof(PyDictKeyEntry);
return res;

据我所知,拆分表字典仅为实例的命名空间创建使用 dict() {} (也如PEP所述)






除此之外,因为它很有趣我们可以随时打破这个优化。目前有两种方法,一种愚蠢的方式或一个更明智的情况:


  1. 愚蠢: p>

     >>> f = Foo(20,30)
    >>>> getsizeof(vars(f))
    96
    >>> vars(f).update({1:1})#添加非字符串键
    >>> getsizeof(vars(f))
    288

    拆分表只支持字符串键,非串键(真正使感觉)打破了这个规则,CPython将分裂表变成了一个失去所有内存增益的组合表。


  2. 可能发生的情况:

     >>> f1,f2 = Foo(20,30),Foo(30,40)
    >>>>对于i,枚举中的j([f1,f2]):
    ... setattr(j,'i'+ str(i),i)
    ... print(getsizeof(vars(j ))
    96
    288

    不同的键插入到一个班级最终会导致分组合并。这仅适用于已创建的实例;从类创建的所有结果实例将具有组合字典而不是拆分。

      #运行上一个代码段
    >>>> getsizeof(vars(Foo(100,200)))
    288


当然,除了有趣之外,没有什么好的理由,故意这样做。






如果有人在游荡,Python 3.6的字典实现不会改变这个事实。上述两种上述形式的字典仍然可用,只是进一步压缩(执行 dict .__ sizeof __ 也改变了,所以从<$ c $返回的值会出现一些差异c> getsizeof 。)


In Python, dictionaries created for the instances of a class are tiny compared to the dictionaries created containing the same attributes of that class:

import sys

class Foo(object):
    def __init__(self, a, b):
        self.a = a
        self.b = b

f = Foo(20, 30)

When using Python 3.5.2, the following calls to getsizeof produce:

>>> sys.getsizeof(vars(f))  # vars gets obj.__dict__
96 
>>> sys.getsizeof(dict(vars(f))
288

288 - 96 = 192 bytes saved!

Using Python 2.7.12, though, on the other hand, the same calls return:

>>> sys.getsizeof(vars(f))
280
>>> sys.getsizeof(dict(vars(f)))
280

0 bytes saved.

In both cases, the dictionaries obviously have exactly the same contents:

>>> vars(f) == dict(vars(f))
True

so this isn't a factor. Also, this also applies to Python 3 only.

So, what's going on here? Why is the size of the __dict__ of an instance so tiny in Python 3?

解决方案

In short:

Instance __dict__'s are implemented differently than the 'normal' dictionaries created with dict or {}. The dictionaries of an instance share the keys and hashes and the keep a separate array for the parts that differ: the values. sys.getsizeof only counts those values when calculating the size for the instance dict.

A bit more:

Dictionaries in CPython are, as of Python 3.3, implemented in one of two forms:

Instance dictionaries are always implemented in a split-table form (a Key-Sharing Dictionary) which allows instances of a given class to share the keys (and hashes) for their __dict__ and only differ in the corresponding values.

This is all described in PEP 412 -- Key-Sharing Dictionary. The implementation for the split dictionary landed in Python 3.3 so, previous versions of the 3 family as well as Python 2.x don't have this implementation.

The implementation of __sizeof__ for dictionaries takes this fact into account and only considers the size that corresponds to the values array when calculating the size for a split dictionary.

It's thankfully, self-explanatory:

Py_ssize_t size, res;

size = DK_SIZE(mp->ma_keys);
res = _PyObject_SIZE(Py_TYPE(mp));
if (mp->ma_values)                    /*Add the values to the result*/
    res += size * sizeof(PyObject*);
/* If the dictionary is split, the keys portion is accounted-for
   in the type object. */
if (mp->ma_keys->dk_refcnt == 1)     /* Add keys/hashes size to res */
    res += sizeof(PyDictKeysObject) + (size-1) * sizeof(PyDictKeyEntry);
return res;

As far as I know, split-table dictionaries are created only for the namespace of instances, using dict() or {} (as also described in the PEP) always results in a combined dictionary that doesn't have these benefits.


As an aside, since it's fun, we can always break this optimization. There's two current ways I've currently found, a silly way or by a more sensible scenario:

  1. Being silly:

    >>> f = Foo(20, 30)
    >>> getsizeof(vars(f))
    96
    >>> vars(f).update({1:1})  # add a non-string key
    >>> getsizeof(vars(f))
    288
    

    Split tables only support string keys, adding a non-string key (which really makes zero sense) breaks this rule and CPython turns the split table into a combined one loosing all memory gains.

  2. A scenario that might happen:

    >>> f1, f2 = Foo(20, 30), Foo(30, 40)
    >>> for i, j in enumerate([f1, f2]):
    ...    setattr(j, 'i'+str(i), i)
    ...    print(getsizeof(vars(j)))
    96
    288
    

    Different keys being inserted in the instances of a class will eventually lead to the split table getting combined. This doesn't apply only to the instances already created; all consequent instances created from the class will be have a combined dictionary instead of a split one.

    # after running previous snippet
    >>> getsizeof(vars(Foo(100, 200)))
    288
    

of course, there's no good reason, other than for fun, for doing this on purpose.


If anyone is wandering, Python 3.6's dictionary implementation doesn't change this fact. The two aforementioned forms of dictionaries while still available are just further compacted (the implementation of dict.__sizeof__ also changed, so some differences should come up in values returned from getsizeof.)

这篇关于为什么__dict__的实例在Python 3中如此之小?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆