与sys.getsizeof()的结果相比,整数的内存占用量大 [英] Large memory footprint of integers compared with result of sys.getsizeof()
问题描述
[1,2 ^ 30)
范围内的Python整数对象需要 28
个字节,如下所示由 sys.getsizeof()
进行解释,例如在此SO-帖子。
Python-Integer-objects in the range [1,2^30)
need 28
byte, as provided by sys.getsizeof()
and explained for example in this SO-post.
但是,当我使用以下脚本测量内存占用量时:
However, when I measure the memory footprint with the following script:
#int_list.py:
import sys
N=int(sys.argv[1])
lst=[0]*N # no overallocation
for i in range(N):
lst[i]=1000+i # ints not from integer pool
通过
/usr/bin/time -fpeak_used_memory:%M python3 int_list.py <N>
我得到以下峰值内存值(Linux-x64,Python 3.6.2):
I get the following peak memory values (Linux-x64, Python 3.6.2):
N Peak memory in Kb bytes/integer
-------------------------------------------
1 9220
1e7 404712 40.50
2e7 800612 40.52
3e7 1196204 40.52
4e7 1591948 40.52
所以看起来好像每个一个整数对象需要40.5
个字节,即比 sys.getsizeof()$产生的字节多
12.5
个字节c $ c>。
So it looks as if 40.5
bytes are needed per one integer object, i.e. 12.5
bytes more than yielded by sys.getsizeof()
.
其他 8
个字节很容易解释-列表 lst
不保存整数对象,但引用它们-这意味着需要一个附加指针,即 8
个字节。
Additional 8
bytes are easy to explain - the list lst
doesn't hold the integer objects, but references to them - that means an additonal pointer, i.e. 8
bytes, are needed.
但是,其他 4.5
个字节又如何呢?
However, what about the other 4.5
bytes, what are they used for?
可以排除以下原因:
- 整数对象的大小是可变的,但
10 ^ 7
小于2 ^ 30
,因此所有整数都将为28
个字节。 -
lst
列表中没有过度分配,可以通过轻松检查sys.getsizeof(lst)
的结果是元素数量乘以8
倍,再加上很小的开销。
- size of integer-objects are variable, but
10^7
is smaller than2^30
and thus all integer will be28
bytes large. - There is no overallocation in the list
lst
, which can be easily checked viasys.getsizeof(lst)
which yields8
times the number of elements, plus a very small overhead.
推荐答案
@Nathan的建议出人意料地不是解决方案,因为CPython的 longint
-实现。根据他的解释,
@Nathan's suggestion is surprisingly not the solution, due to some subtly details of CPython's longint
-implementation. With his explanation, the memory footprint for
...
lst[i] = (1<<30)+i
仍应为 40.52
,因为 sys.sizeof(1<< 30)
是 32
,但是测量结果显示它是 48.56
。另一方面,对于
should still be 40.52
, because sys.sizeof(1<<30)
is 32
, but the measurements show it to be 48.56
. On the other hand, for
...
lst[i] = (1<<60)+i
尽管足迹不足,足迹仍为 48.56
事实上, sys.sizeof(1 << <60)
是 36
。
the footprint is still 48.56
, despite the fact, that sys.sizeof(1<<60)
is 36
.
原因: sys.getsizeof()
不能告诉我们求和结果的实际内存占用量,即 a + b
The reason: the sys.getsizeof()
doesn't tell us the real memory footprint for the result of a summation, i.e. a+b
which is
-
1000 + i
- 36个字节,用于
(1 << 30)+ i
- 40个字节,用于
(1 << 60)+ i
- 32 bytes for
1000+i
- 36 bytes for
(1<<30)+i
- 40 bytes for
(1<<60)+i
之所以会发生这种情况,是因为在 x_add
,结果整数的第一个数字即4个字节,大于 a
和 b
:
That happens because when two integers are added in x_add
, the resulting integer has at first one "digit", i.e. 4 bytes, more than the maximum of a
and b
:
static PyLongObject *
x_add(PyLongObject *a, PyLongObject *b)
{
Py_ssize_t size_a = Py_ABS(Py_SIZE(a)), size_b = Py_ABS(Py_SIZE(b));
PyLongObject *z;
...
/* Ensure a is the larger of the two: */
...
z = _PyLong_New(size_a+1);
...
加法后的结果归一化:
...
return long_normalize(z);
};
即可能的前导零被丢弃,但没有释放内存-4个字节不值得,函数的源可以找到此处。
i.e. the possible leading zeros are discarded, but the memory isn't released - 4 bytes aren't worth it, the source of the function can be found here.
现在,我们可以使用@Nathans洞察力来解释为什么(1 << 30)+ i
的足迹是 48.56
而不是 44.xy
:使用的 py_malloc
分配器使用对齐方式为<$ c的内存块$ c> 8 个字节,这意味着 36
个字节将存储在大小为 40 $ c $的块中c>-与
(1 <<<< 60)+ i
的结果相同(请记住要为指针保留额外的8个字节)。
Now, we can use @Nathans insight to explain, why the footprint of (1<<30)+i
is 48.56
and not 44.xy
: The used py_malloc
-allocator uses memory-blocks with alignment of 8
bytes, that means 36
bytes will be stored in a block of size 40
- the same as the result of (1<<60)+i
(keep the additional 8-bytes for pointers in mind).
要解释剩余的 0.5
个字节,我们需要更深入地研究 py_malloc
-分配器。很好的概述是源代码本身,我最后一次描述它的方法可以在 SO-post 中找到。
To explain the remaining 0.5
bytes we need to dive deeper into details of py_malloc
-allocator. A good overview is the source-code itself, my last try to describe it can be found in this SO-post.
简而言之,分配器在舞台上管理内存,每个256MB。分配竞技场后,将保留内存但未提交。仅当触摸了所谓的 pool
时,我们才将内存视为已使用。池小于 4Kb
( POOL_SIZE
),仅用于相同大小的内存块-在我们的示例中为 32
个字节。这意味着 peak_used_memory
的分辨率为4Kb,不能对那些 0.5
字节负责。
In a nutshell, the allocator manages memory in arenas, each 256MB. When an arena is allocated, the memory is reserved, but not commited. We see memory as "used", only when a so called pool
is touched. A pool is 4Kb
big (POOL_SIZE
) and is used only for memory-blocks with same size - in our case 32
byte. That means the resolution of peak_used_memory
is 4Kb and cannot be responsible for those 0.5
bytes.
但是,必须对这些池进行管理,这会导致额外的开销: py_malloc
需要每个池 pool_header
:
However, these pools must be managed, which leads to an additional overhead: py_malloc
needs a pool_header
per pool:
/* Pool for small blocks. */
struct pool_header {
union { block *_padding;
uint count; } ref; /* number of allocated blocks */
block *freeblock; /* pool's free list head */
struct pool_header *nextpool; /* next pool of this size class */
struct pool_header *prevpool; /* previous pool "" */
uint arenaindex; /* index into arenas of base adr */
uint szidx; /* block size class index */
uint nextoffset; /* bytes to virgin block */
uint maxnextoffset; /* largest valid nextoffset */
};
此结构的大小为 48
(在我的Linux_64计算机上称为 POOL_OVERHEAD
)字节。此 pool_header
是池的一部分(一种非常聪明的方法,可以避免通过cruntime-memory-allocator进行额外分配),它将代替两个 32
-byte-blocks,这意味着池具有 126
32
个字节整数的位置:
The size of this struct is 48
(called POOL_OVERHEAD
) bytes on my Linux_64 machine. This pool_header
is a part of the pool (a quite smart way to avoid additional allocation via cruntime-memory-allocator) and will take place of two 32
-byte-blocks, that means a pool has place for 126
32
byte integers:
/* Return total number of blocks in pool of size index I, as a uint. */
#define NUMBLOCKS(I) ((uint)(POOL_SIZE - POOL_OVERHEAD) / INDEX2SIZE(I))
这将导致:
-
4Kb / 126 = 32.51
1000 + i
的字节占用量,再加上指针的8个字节。 -
(30< 1)+ i
需要40
个字节,这意味着4Kb
有一个用于<$ c的地方$ c> 102 个块,其中一个(将池划分为40 $时剩余的
16
个字节) c $ c>个字节的块,它们可用于pool_header
)用于pool_header
,这导致到4Kb / 101 = 40.55
个字节(加上8
个字节指针)。
4Kb/126 = 32.51
bytes footprint for1000+i
, plus additional 8 bytes for the pointer.(30<<1)+i
needs40
bytes, that means4Kb
has place for102
blocks, of which one (there are remaining16
bytes when pool is divided in40
-bytes block, and they can be used for thepool_header
) is used forpool_header
, which leads to4Kb/101=40.55
bytes (plus8
byte pointer).
我们还可以看到,还有一些额外的开销负责每个整数 0.01
个字节-不够我照顾。
We can also see, that there are some additional overhead, responsible for ca. 0.01
byte per integer - not big enough for me to care.
这篇关于与sys.getsizeof()的结果相比,整数的内存占用量大的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!