Python 对象以什么结构存储在内存中? [英] In what structure is a Python object stored in memory?

查看:75
本文介绍了Python 对象以什么结构存储在内存中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个 A 类:

A 类(对象):def __init__(self, x):自我.x = xdef __str__(self):返回 self.x

我使用 sys.getsizeof 来查看 A 的实例占用了多少字节:

<预><代码>>>>sys.getsizeof(A(1))64>>>sys.getsizeof(A('a'))64>>>sys.getsizeof(A('aaa'))64

如上面的实验所示,无论self.x是什么,A对象的大小都是一样的.

所以我想知道 python 如何在内部存储对象?

解决方案

这要看是什么对象,也要看是哪种 Python 实现 :-)

在 CPython 中,大多数人们在使用 python 时使用的就是 CPython,所有 Python 对象都由 C 结构体表示,PyObject.所有存储对象"的东西实际上都存储了一个 PyObject *.PyObject 结构包含最少的信息:对象的类型(指向另一个 PyObject 的指针)及其引用计数(ssize_t 大小的整数.) C 中定义的类型使用额外的信息扩展了这个结构,它们需要存储在对象本身中,有时会单独分配额外的数据.

例如,元组(实现为 PyTupleObject 扩展"一个 PyObject 结构)存储它们的长度和它们包含在结构本身内部的 PyObject 指针(结构包含定义中的长度为 1 的数组,但实现分配了一个大小合适的内存块来保存 PyTupleObject 结构以及与元组应保存的项目一样多的项目.)同样的方式,字符串(PyStringObject) 存储它们的长度、它们缓存的哈希值、一些字符串缓存(实习")簿记,以及它们数据的实际字符*.因此元组和字符串是单个内存块.

另一方面,列表 (PyListObject) 存储它们的长度,一个 PyObject ** 用于存储它们的数据,另一个 ssize_t 用于跟踪他们为数据分配了多少空间.因为 Python 将 PyObject 指针存储在任何地方,一旦分配了 PyObject 结构,您就无法增长它——这样做可能需要移动该结构,这意味着找到所有指针并更新它们.因为列表可能需要增长,所以它必须与 PyObject 结构分开分配数据.元组和字符串不能增长,所以他们不需要这个.Dicts (PyDictObject) 以相同的方式工作,尽管它们存储键、键的值和缓存的哈希值,而不仅仅是项.Dict 也有一些额外的开销来容纳小型 dicts 和专门的查找功能.

但这些都是 C 中的类型,您通常可以通过查看 C 源代码来了解它们将使用多少内存.用 Python 而不是 C 定义的类的实例并不那么容易.最简单的例子,经典类的实例,并没有那么困难:它是一个 PyObject 将一个 PyObject * 存储到它的类中(这与存储的类型不同在 PyObject 结构中),一个 PyObject * 到它的 __dict__ 属性(它保存所有其他实例属性)和一个 PyObject * 到它的弱引用列表(由 weakref 模块使用,并且仅在必要时才初始化.)实例的 __dict__ 通常是实例唯一的,因此在计算时这种实例的内存大小"通常也需要计算属性 dict 的大小.但它不必特定于实例!__dict__ 可以赋值给就好了.

新式课程使礼仪复杂化.与经典类不同,新式类的实例不是单独的 C 类型,因此它们不需要单独存储对象的类.它们确实为 __dict__ 和 weakreflist 引用留出了空间,但与经典实例不同的是,它们不要求任意属性的 __dict__ 属性.如果类(及其所有基类)使用 __slots__ 来定义一组严格的属性,并且这些属性都没有命名为 __dict__,则该实例不允许任意属性和没有分配字典.另一方面,__slots__ 定义的属性必须存储在某处.这是通过将那些属性值的 PyObject 指针直接存储在 PyObject 结构体中来完成的,就像用 C 编写的类型一样. __slots__ 中的每个条目都将因此占用一个PyObject *,不管是否设置了属性.

说了这么多,问题仍然存在,因为 Python 中的一切都是对象,而所有持有对象的东西都只持有一个引用,有时很难在对象之间划清界限.两个对象可以引用同一位数据.它们可能持有对该数据的仅有的两个引用.摆脱这两个对象也摆脱了数据.他们都拥有数据吗?是否只有其中之一,但如果有,是哪一个?或者你会说他们拥有一半的数据,即使摆脱一个对象并不会释放一半的数据?Weakrefs 可以使这更加复杂:两个对象可以引用相同的数据,但删除一个对象可能会导致另一个对象摆脱它对该数据的引用,导致数据毕竟要清理干净.

幸运的是,常见情况很容易弄清楚.Python 的内存调试器可以很好地跟踪这些事情,例如 heapy.只要您的类(及其基类)相当简单,您就可以对它将占用多少内存进行有根据的猜测——尤其是在大量内存中.如果您真的想知道数据结构的确切大小,请查阅 CPython 源代码;大多数内置类型都是在 Include/object.h 中描述并在 Objects/object.c 中实现的简单结构.PyObject 结构本身在 Include/object.h 中有描述.请记住:它是一直向下的指针;那些也占用空间.

Say I have a class A:

class A(object):
    def __init__(self, x):
        self.x = x

    def __str__(self):
        return self.x

And I use sys.getsizeof to see how many bytes instance of A takes:

>>> sys.getsizeof(A(1))
64
>>> sys.getsizeof(A('a'))
64
>>> sys.getsizeof(A('aaa'))
64

As illustrated in the experiment above, the size of an A object is the same no matter what self.x is.

So I wonder how python store an object internally?

解决方案

It depends on what kind of object, and also which Python implementation :-)

In CPython, which is what most people use when they use python, all Python objects are represented by a C struct, PyObject. Everything that 'stores an object' really stores a PyObject *. The PyObject struct holds the bare minimum information: the object's type (a pointer to another PyObject) and its reference count (an ssize_t-sized integer.) Types defined in C extend this struct with extra information they need to store in the object itself, and sometimes allocate extra data separately.

For example, tuples (implemented as a PyTupleObject "extending" a PyObject struct) store their length and the PyObject pointers they contain inside the struct itself (the struct contains a 1-length array in the definition, but the implementation allocates a block of memory of the right size to hold the PyTupleObject struct plus exactly as many items as the tuple should hold.) The same way, strings (PyStringObject) store their length, their cached hashvalue, some string-caching ("interning") bookkeeping, and the actual char* of their data. Tuples and strings are thus single blocks of memory.

On the other hand, lists (PyListObject) store their length, a PyObject ** for their data and another ssize_t to keep track of how much room they allocated for the data. Because Python stores PyObject pointers everywhere, you can't grow a PyObject struct once it's allocated -- doing so may require the struct to move, which would mean finding all pointers and updating them. Because a list may need to grow, it has to allocate the data separately from the PyObject struct. Tuples and strings cannot grow, and so they don't need this. Dicts (PyDictObject) work the same way, although they store the key, the value and the cached hashvalue of the key, instead of just the items. Dict also have some extra overhead to accommodate small dicts and specialized lookup functions.

But these are all types in C, and you can usually see how much memory they would use just by looking at the C source. Instances of classes defined in Python rather than C are not so easy. The simplest case, instances of classic classes, is not so difficult: it's a PyObject that stores a PyObject * to its class (which is not the same thing as the type stored in the PyObject struct already), a PyObject * to its __dict__ attribute (which holds all other instance attributes) and a PyObject * to its weakreflist (which is used by the weakref module, and only initialized if necessary.) The instance's __dict__ is usually unique to the instance, so when calculating the "memory size" of such an instance you usually want to count the size of the attribute dict as well. But it doesn't have to be specific to the instance! __dict__ can be assigned to just fine.

New-style classes complicate manners. Unlike with classic classes, instances of new-style classes are not separate C types, so they do not need to store the object's class separately. They do have room for the __dict__ and weakreflist reference, but unlike classic instances they don't require the __dict__ attribute for arbitrary attributes. if the class (and all its baseclasses) use __slots__ to define a strict set of attributes, and none of those attributes is named __dict__, the instance does not allow arbitrary attributes and no dict is allocated. On the other hand, attributes defined by __slots__ have to be stored somewhere. This is done by storing the PyObject pointers for the values of those attributes directly in the PyObject struct, much like is done with types written in C. Each entry in __slots__ will thus take up a PyObject *, regardless of whether the attribute is set or not.

All that said, the problem remains that since everything in Python is an object and everything that holds an object just holds a reference, it's sometimes very difficult to draw the line between objects. Two objects can refer to the same bit of data. They may hold the only two references to that data. Getting rid of both objects also gets rid of the data. Do they both own the data? Does only one of them, but if so, which one? Or would you say they own half the data, even though getting rid of one object doesn't release half the data? Weakrefs can make this even more complicated: two objects can refer to the same data, but deleting one of the objects may cause the other object to also get rid of its reference to that data, causing the data to be cleaned up after all.

Fortunately the common case is fairly easy to figure out. There are memory debuggers for Python that do a reasonable job at keeping track of these things, like heapy. And as long as your class (and its baseclasses) is reasonably simple, you can make an educated guess at how much memory it would take up -- especially in large numbers. If you really want to know the exact sizes of your datastructures, consult the CPython source; most builtin types are simple structs described in Include/<type>object.h and implemented in Objects/<type>object.c. The PyObject struct itself is described in Include/object.h. Just keep in mind: it's pointers all the way down; those take up room too.

这篇关于Python 对象以什么结构存储在内存中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆