Python列表的二进制布局 [英] Binary layout of Python lists

查看:58
本文介绍了Python列表的二进制布局的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个程序,需要了解Python/Cython中不同数据容器的效率(在内存方面).所述容器之一是标准的Python list .

I am writing a program where I need to know the efficiency (memory wise) of different data containers in Python / Cython. One of said containers is the standard Python list.

Python列表让我不胜其烦,因为我不知道它在二进制级别上是如何工作的.与Python不同,C的数组易于理解,因为所有元素都是同一类型,并且空间是提前声明的.这意味着,当程序员想进入数组并为数组建立索引时,程序会数学上知道要访问的内存地址.但是问题是,Python列表可以存储许多不同的数据类型,甚至可以将嵌套列表存储在列表内部.这些数据结构的大小一直在变化,并且列表仍然保留它们,并考​​虑了这些变化.是否存在额外的分隔符内存,以使列表保持原样?

The Python list is tripping me up because I do not know how it works on the binary level. Unlike Python, C's arrays are easy to understand, because all of the elements are the same type, and the space is declared ahead of time. This means when the programmer wants to go in and index the array, the program knows mathematically what memory address to go to. But the problem is, a Python list can store many different data types, and even nested lists inside of a list. The size of these data structures changes all the time, and the list still holds them, accounting for the changes. Does extra separator memory exist to make the list as dynamic as it is?

如果可以的话,我希望在RAM中使用示例列表的实际二进制布局,并用每个字节表示的内容进行注释.当我在二进制级别上工作时,这将有助于我充分了解列表的内部工作原理.

If you could, I would appreciate an actual binary layout of an example list in RAM, annotated with what each byte represents. This will help me to fully understand the inner workings of the list, as I am working on the binary level.

推荐答案

列表对象在 Include/listobject.h .结构真的很简单:

The list object is defined in Include/listobject.h. The structure is really simple:

typedef struct {
    PyObject_VAR_HEAD
    /* Vector of pointers to list elements.  list[0] is ob_item[0], etc. */
    PyObject **ob_item;

    /* ob_item contains space for 'allocated' elements.  The number
     * currently in use is ob_size.
     * Invariants:
     *     0 <= ob_size <= allocated
     *     len(list) == ob_size
     *     ob_item == NULL implies ob_size == allocated == 0
     * list.sort() temporarily sets allocated to -1 to detect mutations.
     *
     * Items must normally not be NULL, except during construction when
     * the list is not yet visible outside the function that builds it.
     */
    Py_ssize_t allocated;
} PyListObject;

PyObject_VAR_HEAD 被定义为

typedef struct _object {
    _PyObject_HEAD_EXTRA
    Py_ssize_t ob_refcnt;
    struct _typeobject *ob_type;
} PyObject;

typedef struct {
    PyObject ob_base;
    Py_ssize_t ob_size; /* Number of items in variable part */
} PyVarObject;

然后,列表对象基本上是这样的:

Basically, then, a list object looks like this:

[ssize_t ob_refcnt]
[type *ob_type]
[ssize_t ob_size]
[object **ob_item] -> [object *][object *][object *]...
[ssize_t allocated]

请注意, len 会检索 ob_size 的值.

ob_item 指向 PyObject * 指针数组.列表中的每个元素都是一个Python对象,Python对象始终通过引用传递(em)(在C-API级别,作为指向实际 PyObject 的指针).因此,列表仅存储指向对象的指针,而不存储对象本身.

ob_item points to an array of PyObject * pointers. Each element in a list is a Python object, and Python objects are always passed by reference (at the C-API level, as pointers to the actual PyObjects). Therefore, lists only store pointers to objects, and not the objects themselves.

当列表填满时,将对其进行重新分配. allocated 跟踪列表最多可以容纳多少个元素(在重新分配之前).重新分配算法位于 Objects/listobject.c :

When a list fills up, it will be reallocated. allocated tracks how many elements the list can hold at maximum (before reallocation). The reallocation algorithm is in Objects/listobject.c:

/* Ensure ob_item has room for at least newsize elements, and set
 * ob_size to newsize.  If newsize > ob_size on entry, the content
 * of the new slots at exit is undefined heap trash; it's the caller's
 * responsibility to overwrite them with sane values.
 * The number of allocated elements may grow, shrink, or stay the same.
 * Failure is impossible if newsize <= self.allocated on entry, although
 * that partly relies on an assumption that the system realloc() never
 * fails when passed a number of bytes <= the number of bytes last
 * allocated (the C standard doesn't guarantee this, but it's hard to
 * imagine a realloc implementation where it wouldn't be true).
 * Note that self->ob_item may change, and even if newsize is less
 * than ob_size on entry.
 */
static int
list_resize(PyListObject *self, Py_ssize_t newsize)
{
    PyObject **items;
    size_t new_allocated;
    Py_ssize_t allocated = self->allocated;

    /* Bypass realloc() when a previous overallocation is large enough
       to accommodate the newsize.  If the newsize falls lower than half
       the allocated size, then proceed with the realloc() to shrink the list.
    */
    if (allocated >= newsize && newsize >= (allocated >> 1)) {
        assert(self->ob_item != NULL || newsize == 0);
        Py_SIZE(self) = newsize;
        return 0;
    }

    /* This over-allocates proportional to the list size, making room
     * for additional growth.  The over-allocation is mild, but is
     * enough to give linear-time amortized behavior over a long
     * sequence of appends() in the presence of a poorly-performing
     * system realloc().
     * The growth pattern is:  0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
     */
    new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);

    /* check for integer overflow */
    if (new_allocated > PY_SIZE_MAX - newsize) {
        PyErr_NoMemory();
        return -1;
    } else {
        new_allocated += newsize;
    }

    if (newsize == 0)
        new_allocated = 0;
    items = self->ob_item;
    if (new_allocated <= (PY_SIZE_MAX / sizeof(PyObject *)))
        PyMem_RESIZE(items, PyObject *, new_allocated);
    else
        items = NULL;
    if (items == NULL) {
        PyErr_NoMemory();
        return -1;
    }
    self->ob_item = items;
    Py_SIZE(self) = newsize;
    self->allocated = new_allocated;
    return 0;
}

从评论中可以看到,列表按以下顺序缓慢增长:

As you can see from the comments, lists grow rather slowly, in the following sequence:

0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...

这篇关于Python列表的二进制布局的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆