与sys.getsizeof（）的结果相比，整数的内存占用量大 [英] Large memory footprint of integers compared with result of sys.getsizeof()

查看：65 发布时间：2020/10/11 0:52:54 python python-3.x performance cpython python-internals

本文介绍了与sys.getsizeof（）的结果相比，整数的内存占用量大的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

[1,2 ^ 30）范围内的Python整数对象需要 28 个字节，如下所示由 sys.getsizeof（）进行解释，例如在此SO-帖子。

Python-Integer-objects in the range [1,2^30) need 28 byte, as provided by sys.getsizeof() and explained for example in this SO-post.

但是，当我使用以下脚本测量内存占用量时：

However, when I measure the memory footprint with the following script:

#int_list.py:
import sys

N=int(sys.argv[1])
lst=[0]*N            # no overallocation

for i in range(N):
    lst[i]=1000+i    # ints not from integer pool

通过

/usr/bin/time -fpeak_used_memory:%M python3 int_list.py <N>

我得到以下峰值内存值（Linux-x64，Python 3.6.2）：

I get the following peak memory values (Linux-x64, Python 3.6.2):

   N     Peak memory in Kb        bytes/integer
-------------------------------------------   
   1            9220              
   1e7        404712                40.50 
   2e7        800612                40.52 
   3e7       1196204                40.52
   4e7       1591948                40.52

所以看起来好像每个一个整数对象需要40.5 个字节，即比 sys.getsizeof（） 12.5 个字节c $ c>。

So it looks as if 40.5 bytes are needed per one integer object, i.e. 12.5 bytes more than yielded by sys.getsizeof().

其他 8 个字节很容易解释-列表 lst 不保存整数对象，但引用它们-这意味着需要一个附加指针，即 8 个字节。

Additional 8 bytes are easy to explain - the list lst doesn't hold the integer objects, but references to them - that means an additonal pointer, i.e. 8 bytes, are needed.

但是，其他 4.5 个字节又如何呢？

However, what about the other 4.5 bytes, what are they used for?

可以排除以下原因：

整数对象的大小是可变的，但 10 ^ 7 小于 2 ^ 30 ，因此所有整数都将为 28 个字节。

lst 列表中没有过度分配，可以通过轻松检查sys.getsizeof（lst）的结果是元素数量乘以 8 倍，再加上很小的开销。

size of integer-objects are variable, but 10^7 is smaller than 2^30 and thus all integer will be 28 bytes large.
There is no overallocation in the list lst, which can be easily checked via sys.getsizeof(lst) which yields 8 times the number of elements, plus a very small overhead.

推荐答案

@Nathan的建议出人意料地不是解决方案，因为CPython的 longint -实现。根据他的解释，

@Nathan's suggestion is surprisingly not the solution, due to some subtly details of CPython's longint-implementation. With his explanation, the memory footprint for

...
lst[i] = (1<<30)+i

仍应为 40.52 ，因为 sys.sizeof（1<< 30）是 32 ，但是测量结果显示它是 48.56 。另一方面，对于

should still be 40.52, because sys.sizeof(1<<30) is 32, but the measurements show it to be 48.56. On the other hand, for

...
lst[i] = (1<<60)+i

尽管足迹不足，足迹仍为 48.56 事实上， sys.sizeof（1 << <60）是 36 。

the footprint is still 48.56, despite the fact, that sys.sizeof(1<<60) is 36.

原因： sys.getsizeof（）不能告诉我们求和结果的实际内存占用量，即 a + b

The reason: the sys.getsizeof() doesn't tell us the real memory footprint for the result of a summation, i.e. a+b which is

1000 + i

36个字节，用于（1 << 30）+ i

40个字节，用于（1 << 60）+ i

32 bytes for 1000+i
36 bytes for (1<<30)+i
40 bytes for (1<<60)+i

之所以会发生这种情况，是因为在 x_add ，结果整数的第一个数字即4个字节，大于 a 和 b ：


That happens because when two integers are added in x_add, the resulting integer has at first one "digit", i.e. 4 bytes, more than the maximum of a and b:
static PyLongObject *
x_add(PyLongObject *a, PyLongObject *b)
{
    Py_ssize_t size_a = Py_ABS(Py_SIZE(a)), size_b = Py_ABS(Py_SIZE(b));
    PyLongObject *z;
    ...
    /* Ensure a is the larger of the two: */
    ...
    z = _PyLong_New(size_a+1);  
    ...

加法后的结果归一化：
 ...
 return long_normalize(z);

}； 
即可能的前导零被丢弃，但没有释放内存-4个字节不值得，函数的源可以找到此处。
i.e. the possible leading zeros are discarded, but the memory isn't released - 4 bytes aren't worth it, the source of the function can be found here.

现在，我们可以使用@Nathans洞察力来解释为什么（1 << 30）+ i 的足迹是 48.56 而不是 44.xy ：使用的 py_malloc 分配器使用对齐方式为<$ c的内存块$ c> 8 个字节，这意味着 36 个字节将存储在大小为 40 -与（1 <<<< 60）+ i 的结果相同（请记住要为指针保留额外的8个字节）。


Now, we can use @Nathans insight to explain, why the footprint of (1<<30)+i is 48.56 and not 44.xy: The used py_malloc-allocator uses memory-blocks with alignment of 8 bytes, that means 36 bytes will be stored in a block of size 40 - the same as the result of (1<<60)+i (keep the additional 8-bytes for pointers in mind).
要解释剩余的 0.5 个字节，我们需要更深入地研究 py_malloc -分配器。很好的概述是源代码本身，我最后一次描述它的方法可以在 SO-post 中找到。
To explain the remaining 0.5 bytes we need to dive deeper into details of py_malloc-allocator. A good overview is the source-code itself, my last try to describe it can be found in this SO-post.
简而言之，分配器在舞台上管理内存，每个256MB。分配竞技场后，将保留内存但未提交。仅当触摸了所谓的 pool 时，我们才将内存视为已使用。池小于 4Kb （  POOL_SIZE  ），仅用于相同大小的内存块-在我们的示例中为 32 个字节。这意味着 peak_used_memory 的分辨率为4Kb，不能对那些 0.5 字节负责。
In a nutshell, the allocator manages memory in arenas, each 256MB. When an arena is allocated, the memory is reserved, but not commited. We see memory as "used", only when a so called pool is touched. A pool is 4Kb big (POOL_SIZE) and is used only for memory-blocks with same size - in our case 32 byte. That means the resolution of peak_used_memory is 4Kb and cannot be responsible for those 0.5 bytes.
但是，必须对这些池进行管理，这会导致额外的开销： py_malloc 需要每个池 pool_header  ：
However, these pools must be managed, which leads to an additional overhead: py_malloc needs a pool_header per pool:
/* Pool for small blocks. */
struct pool_header {
    union { block *_padding;
            uint count; } ref;          /* number of allocated blocks    */
    block *freeblock;                   /* pool's free list head         */
    struct pool_header *nextpool;       /* next pool of this size class  */
    struct pool_header *prevpool;       /* previous pool       ""        */
    uint arenaindex;                    /* index into arenas of base adr */
    uint szidx;                         /* block size class index        */
    uint nextoffset;                    /* bytes to virgin block         */
    uint maxnextoffset;                 /* largest valid nextoffset      */
};

此结构的大小为 48 （在我的Linux_64计算机上称为 POOL_OVERHEAD ）字节。此 pool_header 是池的一部分（一种非常聪明的方法，可以避免通过cruntime-memory-allocator进行额外分配），它将代替两个 32  -byte-blocks，这意味着池具有  126   32 个字节整数的位置：
The size of this struct is 48 (called POOL_OVERHEAD) bytes on my Linux_64 machine. This pool_header is a part of the pool (a quite smart way to avoid additional allocation via cruntime-memory-allocator) and will take place of two 32-byte-blocks, that means a pool has place for 126 32 byte integers:
/* Return total number of blocks in pool of size index I, as a uint. */
#define NUMBLOCKS(I) ((uint)(POOL_SIZE - POOL_OVERHEAD) / INDEX2SIZE(I))

这将导致：

 
   4Kb / 126 = 32.51   1000 + i 的字节占用量，再加上指针的8个字节。

（30< 1）+ i 需要 40 个字节，这意味着 4Kb 有一个用于<$ c的地方$ c> 102 个块，其中一个（将池划分为 40 16 个字节） c $ c>个字节的块，它们可用于 pool_header ）用于 pool_header ，这导致到 4Kb / 101 = 40.55 个字节（加上 8 个字节指针）。

4Kb/126 = 32.51 bytes footprint for 1000+i, plus additional 8 bytes for the pointer.
(30<<1)+i needs 40 bytes, that means 4Kb has place for 102 blocks, of which one (there are remaining 16 bytes when pool is divided in 40-bytes block, and they can be used for the pool_header) is used forpool_header, which leads to 4Kb/101=40.55 bytes (plus 8 byte pointer).

我们还可以看到，还有一些额外的开销负责每个整数 0.01 个字节-不够我照顾。

We can also see, that there are some additional overhead, responsible for ca. 0.01 byte per integer - not big enough for me to care.

这篇关于与sys.getsizeof（）的结果相比，整数的内存占用量大的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

与sys.getsizeof（）的结果相比，整数的内存占用量大 [英] Large memory footprint of integers compared with result of sys.getsizeof()

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

与sys.getsizeof（）的结果相比，整数的内存占用量大 [英] Large memory footprint of integers compared with result of sys.getsizeof()

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭