为什么在python 3中实例的__dict__这么小? [英] Why is the __dict__ of instances so much smaller in size in Python 3?

查看:80
本文介绍了为什么在python 3中实例的__dict__这么小?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Python中,为类实例创建的字典与包含该类相同属性的字典相比很小:

import sys

class Foo(object):
    def __init__(self, a, b):
        self.a = a
        self.b = b

f = Foo(20, 30)

使用Python 3.5.2时,对getsizeof的以下调用产生:

>>> sys.getsizeof(vars(f))  # vars gets obj.__dict__
96 
>>> sys.getsizeof(dict(vars(f))
288

288 - 96 = 192字节已保存!

但是,另一方面,使用Python 2.7.12,则返回相同的调用:

>>> sys.getsizeof(vars(f))
280
>>> sys.getsizeof(dict(vars(f)))
280

已保存

0个字节.

在这两种情况下,字典的内容显然都完全相同 :

>>> vars(f) == dict(vars(f))
True

所以这不是一个因素.另外,这也仅适用于Python 3.

那么,这是怎么回事?为什么实例__dict__的大小在Python 3中如此之小?

解决方案

简而言之:

实例__dict__的实现与使用dict{}创建的普通"字典的实现方式不同.实例的字典共享键和哈希,并为不同的部分(值)保留一个单独的数组. sys.getsizeof仅在计算实例字典的大小时才计算这些值.

更多:

从Python 3.3开始,CPython中的词典以以下两种形式之一实现:

  • 组合字典:字典的所有值都与每个条目的键和哈希一起存储. ( me_value 结构成员).据我所知,这种形式用于使用dict{}和模块名称空间创建的字典.
  • 分割表:这些值分别存储在数组中,同时共享键和哈希( PEP 412-密钥共享字典中进行了描述.拆分字典的实现位于Python 3.3中,因此,3系列的早期版本以及Python 2.x都没有此实现.

    __sizeof__ 的字典实现考虑到这一事实,并且在计算拆分字典的大小时仅考虑与values数组相对应的大小.

    值得庆幸的是,不言而喻:

     Py_ssize_t size, res;
    
    size = DK_SIZE(mp->ma_keys);
    res = _PyObject_SIZE(Py_TYPE(mp));
    if (mp->ma_values)                    /*Add the values to the result*/
        res += size * sizeof(PyObject*);
    /* If the dictionary is split, the keys portion is accounted-for
       in the type object. */
    if (mp->ma_keys->dk_refcnt == 1)     /* Add keys/hashes size to res */
        res += sizeof(PyDictKeysObject) + (size-1) * sizeof(PyDictKeyEntry);
    return res;
     

    据我所知,拆分表字典是使用dict(){}(同样在PEP中描述的)仅为实例的名称空间创建的 总是导致没有这些优点的组合字典.


    顺便说一句,因为它很有趣,所以我们总是可以打破这种优化.我目前发现了两种当前的方法,一种愚蠢的方法,或者是一种更为明智的情况:

    1. 傻瓜:

      >>> f = Foo(20, 30)
      >>> getsizeof(vars(f))
      96
      >>> vars(f).update({1:1})  # add a non-string key
      >>> getsizeof(vars(f))
      288
      

      拆分表仅支持字符串键,添加非字符串键(确实使有意义)违反了该规则,CPython将拆分表变成一个组合,从而失去了所有内存.

    2. 可能发生的情况:

      >>> f1, f2 = Foo(20, 30), Foo(30, 40)
      >>> for i, j in enumerate([f1, f2]):
      ...    setattr(j, 'i'+str(i), i)
      ...    print(getsizeof(vars(j)))
      96
      288
      

      在类的实例中插入不同的键最终将导致拆分表被合并.这并不仅仅适用于已经创建的实例.从该类创建的所有 conequent 实例都将具有组合字典,而不是拆分字典.

      # after running previous snippet
      >>> getsizeof(vars(Foo(100, 200)))
      288
      

    当然,除了娱乐之外,没有其他理由是故意这样做的.


    如果有人想知道,Python 3.6的字典实现不会改变这一事实.上面提到的两种形式的字典仍然可以使用(只是进一步压缩了(dict.__sizeof__的实现也进行了更改,因此从getsizeof返回的值应该有一些区别.)

    In Python, dictionaries created for the instances of a class are tiny compared to the dictionaries created containing the same attributes of that class:

    import sys
    
    class Foo(object):
        def __init__(self, a, b):
            self.a = a
            self.b = b
    
    f = Foo(20, 30)
    

    When using Python 3.5.2, the following calls to getsizeof produce:

    >>> sys.getsizeof(vars(f))  # vars gets obj.__dict__
    96 
    >>> sys.getsizeof(dict(vars(f))
    288
    

    288 - 96 = 192 bytes saved!

    Using Python 2.7.12, though, on the other hand, the same calls return:

    >>> sys.getsizeof(vars(f))
    280
    >>> sys.getsizeof(dict(vars(f)))
    280
    

    0 bytes saved.

    In both cases, the dictionaries obviously have exactly the same contents:

    >>> vars(f) == dict(vars(f))
    True
    

    so this isn't a factor. Also, this also applies to Python 3 only.

    So, what's going on here? Why is the size of the __dict__ of an instance so tiny in Python 3?

    解决方案

    In short:

    Instance __dict__'s are implemented differently than the 'normal' dictionaries created with dict or {}. The dictionaries of an instance share the keys and hashes and the keep a separate array for the parts that differ: the values. sys.getsizeof only counts those values when calculating the size for the instance dict.

    A bit more:

    Dictionaries in CPython are, as of Python 3.3, implemented in one of two forms:

    Instance dictionaries are always implemented in a split-table form (a Key-Sharing Dictionary) which allows instances of a given class to share the keys (and hashes) for their __dict__ and only differ in the corresponding values.

    This is all described in PEP 412 -- Key-Sharing Dictionary. The implementation for the split dictionary landed in Python 3.3 so, previous versions of the 3 family as well as Python 2.x don't have this implementation.

    The implementation of __sizeof__ for dictionaries takes this fact into account and only considers the size that corresponds to the values array when calculating the size for a split dictionary.

    It's thankfully, self-explanatory:

    Py_ssize_t size, res;
    
    size = DK_SIZE(mp->ma_keys);
    res = _PyObject_SIZE(Py_TYPE(mp));
    if (mp->ma_values)                    /*Add the values to the result*/
        res += size * sizeof(PyObject*);
    /* If the dictionary is split, the keys portion is accounted-for
       in the type object. */
    if (mp->ma_keys->dk_refcnt == 1)     /* Add keys/hashes size to res */
        res += sizeof(PyDictKeysObject) + (size-1) * sizeof(PyDictKeyEntry);
    return res;
    

    As far as I know, split-table dictionaries are created only for the namespace of instances, using dict() or {} (as also described in the PEP) always results in a combined dictionary that doesn't have these benefits.


    As an aside, since it's fun, we can always break this optimization. There's two current ways I've currently found, a silly way or by a more sensible scenario:

    1. Being silly:

      >>> f = Foo(20, 30)
      >>> getsizeof(vars(f))
      96
      >>> vars(f).update({1:1})  # add a non-string key
      >>> getsizeof(vars(f))
      288
      

      Split tables only support string keys, adding a non-string key (which really makes zero sense) breaks this rule and CPython turns the split table into a combined one loosing all memory gains.

    2. A scenario that might happen:

      >>> f1, f2 = Foo(20, 30), Foo(30, 40)
      >>> for i, j in enumerate([f1, f2]):
      ...    setattr(j, 'i'+str(i), i)
      ...    print(getsizeof(vars(j)))
      96
      288
      

      Different keys being inserted in the instances of a class will eventually lead to the split table getting combined. This doesn't apply only to the instances already created; all consequent instances created from the class will be have a combined dictionary instead of a split one.

      # after running previous snippet
      >>> getsizeof(vars(Foo(100, 200)))
      288
      

    of course, there's no good reason, other than for fun, for doing this on purpose.


    If anyone is wondering, Python 3.6's dictionary implementation doesn't change this fact. The two aforementioned forms of dictionaries while still available are just further compacted (the implementation of dict.__sizeof__ also changed, so some differences should come up in values returned from getsizeof.)

    这篇关于为什么在python 3中实例的__dict__这么小?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆