与sys.getsizeof()的结果相比,整数的内存占用量大 [英] Large memory footprint of integers compared with result of sys.getsizeof()

查看:65
本文介绍了与sys.getsizeof()的结果相比,整数的内存占用量大的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

[1,2 ^ 30)范围内的Python整数对象需要 28 个字节,如下所示由 sys.getsizeof()进行解释,例如在此SO-帖子

Python-Integer-objects in the range [1,2^30) need 28 byte, as provided by sys.getsizeof() and explained for example in this SO-post.

但是,当我使用以下脚本测量内存占用量时:

However, when I measure the memory footprint with the following script:

#int_list.py:
import sys

N=int(sys.argv[1])
lst=[0]*N            # no overallocation

for i in range(N):
    lst[i]=1000+i    # ints not from integer pool

通过

/usr/bin/time -fpeak_used_memory:%M python3 int_list.py <N>

我得到以下峰值内存值(Linux-x64,Python 3.6.2):

I get the following peak memory values (Linux-x64, Python 3.6.2):

   N     Peak memory in Kb        bytes/integer
-------------------------------------------   
   1            9220              
   1e7        404712                40.50 
   2e7        800612                40.52 
   3e7       1196204                40.52
   4e7       1591948                40.52

所以看起来好像每个一个整数对象需要40.5 个字节,即比 sys.getsizeof() 12.5 个字节c $ c>。

So it looks as if 40.5 bytes are needed per one integer object, i.e. 12.5 bytes more than yielded by sys.getsizeof().

其他 8 个字节很容易解释-列表 lst 不保存整数对象,但引用它们-这意味着需要一个附加指针,即 8 个字节。

Additional 8 bytes are easy to explain - the list lst doesn't hold the integer objects, but references to them - that means an additonal pointer, i.e. 8 bytes, are needed.

但是,其他 4.5 个字节又如何呢?

However, what about the other 4.5 bytes, what are they used for?

可以排除以下原因:


  • 整数对象的大小是可变的,但 10 ^ 7 小于 2 ^ 30 ,因此所有整数都将为 28 个字节。

  • lst 列表中没有过度分配,可以通过轻松检查sys.getsizeof(lst)的结果是元素数量乘以 8 倍,再加上很小的开销。

  • size of integer-objects are variable, but 10^7 is smaller than 2^30 and thus all integer will be 28 bytes large.
  • There is no overallocation in the list lst, which can be easily checked via sys.getsizeof(lst) which yields 8 times the number of elements, plus a very small overhead.

推荐答案

@Nathan的建议出人意料地不是解决方案,因为CPython的 longint -实现。根据他的解释,

@Nathan's suggestion is surprisingly not the solution, due to some subtly details of CPython's longint-implementation. With his explanation, the memory footprint for

...
lst[i] = (1<<30)+i

仍应为 40.52 ,因为 sys.sizeof(1<< 30) 32 ,但是测量结果显示它是 48.56 。另一方面,对于

should still be 40.52, because sys.sizeof(1<<30) is 32, but the measurements show it to be 48.56. On the other hand, for

...
lst[i] = (1<<60)+i

尽管足迹不足,足迹仍为 48.56 事实上, sys.sizeof(1 << <60) 36

the footprint is still 48.56, despite the fact, that sys.sizeof(1<<60) is 36.

原因: sys.getsizeof()不能告诉我们求和结果的实际内存占用量,即 a + b

The reason: the sys.getsizeof() doesn't tell us the real memory footprint for the result of a summation, i.e. a+b which is


  • 1000 + i

  • 36个字节,用于(1 << 30)+ i

  • 40个字节,用于(1 << 60)+ i

  • 32 bytes for 1000+i
  • 36 bytes for (1<<30)+i
  • 40 bytes for (1<<60)+i

之所以会发生这种情况,是因为在 x_add ,结果整数的第一个数字即4个字节,大于 a 和 b

That happens because when two integers are added in x_add, the resulting integer has at first one "digit", i.e. 4 bytes, more than the maximum of a and b:

static PyLongObject *
x_add(PyLongObject *a, PyLongObject *b)
{
    Py_ssize_t size_a = Py_ABS(Py_SIZE(a)), size_b = Py_ABS(Py_SIZE(b));
    PyLongObject *z;
    ...
    /* Ensure a is the larger of the two: */
    ...
    z = _PyLong_New(size_a+1);  
    ...

加法后的结果归一化:

 ...
 return long_normalize(z);

};

即可能的前导零被丢弃,但没有释放内存-4个字节不值得,函数的源可以找到此处

i.e. the possible leading zeros are discarded, but the memory isn't released - 4 bytes aren't worth it, the source of the function can be found here.

现在,我们可以使用@Nathans洞察力来解释为什么(1 << 30)+ i 的足迹是 48.56 而不是 44.xy :使用的 py_malloc 分配器使用对齐方式为<$ c的内存块$ c> 8 个字节,这意味着 36 个字节将存储在大小为 40 -与(1 <<<< 60)+ i 的结果相同(请记住要为指针保留额外的8个字节)。

Now, we can use @Nathans insight to explain, why the footprint of (1<<30)+i is 48.56 and not 44.xy: The used py_malloc-allocator uses memory-blocks with alignment of 8 bytes, that means 36 bytes will be stored in a block of size 40 - the same as the result of (1<<60)+i (keep the additional 8-bytes for pointers in mind).

要解释剩余的 0.5 个字节,我们需要更深入地研究 py_malloc -分配器。很好的概述是源代码本身,我最后一次描述它的方法可以在 SO-post 中找到。

To explain the remaining 0.5 bytes we need to dive deeper into details of py_malloc-allocator. A good overview is the source-code itself, my last try to describe it can be found in this SO-post.

简而言之,分配器在舞台上管理内存,每个256MB。分配竞技场后,将保留内存但未提交。仅当触摸了所谓的 pool 时,我们才将内存视为已使用。池小于 4Kb POOL_SIZE ),仅用于相同大小的内存块-在我们的示例中为 32 个字节。这意味着 peak_used_memory 的分辨率为4Kb,不能对那些 0.5 字节负责。

In a nutshell, the allocator manages memory in arenas, each 256MB. When an arena is allocated, the memory is reserved, but not commited. We see memory as "used", only when a so called pool is touched. A pool is 4Kb big (POOL_SIZE) and is used only for memory-blocks with same size - in our case 32 byte. That means the resolution of peak_used_memory is 4Kb and cannot be responsible for those 0.5 bytes.

但是,必须对这些池进行管理,这会导致额外的开销: py_malloc 需要每个池 pool_header

However, these pools must be managed, which leads to an additional overhead: py_malloc needs a pool_header per pool:

/* Pool for small blocks. */
struct pool_header {
    union { block *_padding;
            uint count; } ref;          /* number of allocated blocks    */
    block *freeblock;                   /* pool's free list head         */
    struct pool_header *nextpool;       /* next pool of this size class  */
    struct pool_header *prevpool;       /* previous pool       ""        */
    uint arenaindex;                    /* index into arenas of base adr */
    uint szidx;                         /* block size class index        */
    uint nextoffset;                    /* bytes to virgin block         */
    uint maxnextoffset;                 /* largest valid nextoffset      */
};

此结构的大小为 48 (在我的Linux_64计算机上称为 POOL_OVERHEAD )字节。此 pool_header 是池的一部分(一种非常聪明的方法,可以避免通过cruntime-memory-allocator进行额外分配),它将代替两个 32 -byte-blocks,这意味着池具有 126 32 个字节整数的位置

The size of this struct is 48 (called POOL_OVERHEAD) bytes on my Linux_64 machine. This pool_header is a part of the pool (a quite smart way to avoid additional allocation via cruntime-memory-allocator) and will take place of two 32-byte-blocks, that means a pool has place for 126 32 byte integers:

/* Return total number of blocks in pool of size index I, as a uint. */
#define NUMBLOCKS(I) ((uint)(POOL_SIZE - POOL_OVERHEAD) / INDEX2SIZE(I))

这将导致:


  • 4Kb / 126 = 32.51 1000 + i 的字节占用量,再加上指针的8个字节。

  • (30< 1)+ i 需要 40 个字节,这意味着 4Kb 有一个用于<$ c的地方$ c> 102 个块,其中一个(将池划分为 40 16 个字节) c $ c>个字节的块,它们可用于 pool_header )用于 pool_header ,这导致到 4Kb / 101 = 40.55 个字节(加上 8 个字节指针)。

  • 4Kb/126 = 32.51 bytes footprint for 1000+i, plus additional 8 bytes for the pointer.
  • (30<<1)+i needs 40 bytes, that means 4Kb has place for 102 blocks, of which one (there are remaining 16 bytes when pool is divided in 40-bytes block, and they can be used for the pool_header) is used forpool_header, which leads to 4Kb/101=40.55 bytes (plus 8 byte pointer).

我们还可以看到,还有一些额外的开销负责每个整数 0.01 个字节-不够我照顾。

We can also see, that there are some additional overhead, responsible for ca. 0.01 byte per integer - not big enough for me to care.

这篇关于与sys.getsizeof()的结果相比,整数的内存占用量大的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆