为什么我的类需要这么多内存? [英] Why does my class cost so much memory?
问题描述
from guppy import hpyhp = hpy()类演示(对象):__slots__ = ('v0', 'v1')def __init__(self, v0, v1):self.v0 = v0self.v1 = v1从数组导入数组值 = 1.01ar = 数组('f')ar2 = 数组('f')对于我在范围内(5000000):ar.append(value + i)ar2.append(value + i * 0.1 + i * 0.01 + i * 0.001 + i * 0.0001 + i * 0.000001)a = []对于我在范围内(5000000):vex = Demo(ar[i], ar[2])a.append(vex)打印函数末尾的堆",hp.heap()
输出如下:
函数末尾的堆 15063247个对象集的分区.总大小 = 650251664 字节.索引计数 % 大小 % 累积 % 种类(类/类的字典)0 5000000 33 320000000 49 320000000 49 __main__.Demo1 10000108 66 240002592 37 560002592 86 浮动2 368 0 42008896 6 602011488 93 名单3 2 0 40000112 6 642011600 99 array.array4 28182 0 2214784 0 644226384 995 12741 0 1058448 0 645284832 99 元组6 189 0 669624 0 645954456 99 模块字典7 371 0 588104 0 646542560 99 dict(无所有者)8 258 0 509232 0 647051792 100 sip.wrappertype 字典9 3176 0 406528 0 647458320 100 种类型.代码类型
我想知道为什么 Demo 类需要这么多内存.因为 Demo 类只保留对浮点数的引用,它不会复制浮点数.
getSizeOf(Demo) # 984
50W 的 Demo 类可能只是消耗内存:984*50W=40215176
但是,现在花费 320000000
.难以置信,为什么?
sys.getsizeof()
不会递归到子对象中,您只取类的大小,不是一个实例.每个实例占用 64 个字节,每个 float
对象加上 24 个字节(在 OS X 上,使用 Python 2.7.12):
每个槽只为实例对象中的一个指针保留内存;在我的机器上,每个指针 8 个字节.
Demo()
实例和数组之间有几个区别:
- 实例具有最小的开销来支持引用计数和弱引用,并且包含一个指向它们的类的指针.数组直接存储值,没有任何开销.
- 实例存储 Python 浮点数.这些是成熟的对象,包括引用计数和弱引用支持.该数组将单精度浮点数存储为 C 值,而 Python
float
对象模型双精度 精度浮点数.因此,该实例仅使用 2 * 24 字节(在我的 Mac 上)用于那些浮点数,而数组中的每个单精度'f'
值仅使用 4 个字节. - 要跟踪 500 万个
Demo
实例,您还需要创建一个list
对象,该对象的大小可以处理至少 500 万个对象参考.array
直接存储 C 单精度浮点数.
hp.heap()
输出只计算实例占用空间,而不是每行引用的 float
值,但总数匹配:
- 500 万次 64 字节是
Demo
实例的 320.000.000 字节内存. - 1000 万次 24 字节是
float
实例的 240.000.000 字节内存,再加上其他地方引用的另外 108 个浮点数.
这两组共同构成了堆上 1500 万个 Python 对象中的大部分.
- 您创建的用于保存实例的
list
对象包含 500 万个指针,即指向所有Demo
实例的 40.000.000 字节,加上用于那个对象.堆上还有 367 个列表,由其他 Python 代码引用. - 2 个
array
实例,每 500 万个 4 字节浮点数为 40.000.000 字节,加上每个数组开销 56 字节.
所以 array
对象在存储大量数值时效率更高,因为它将这些值存储为原始 C 值.但是,缺点是 Python 必须装箱您尝试访问的每个值;所以访问 ar[10]
会返回一个 Python float
对象.
from guppy import hpy
hp = hpy()
class Demo(object):
__slots__ = ('v0', 'v1')
def __init__(self, v0, v1):
self.v0 = v0
self.v1 = v1
from array import array
value = 1.01
ar = array('f')
ar2 = array('f')
for i in range(5000000):
ar.append(value + i)
ar2.append(value + i * 0.1 + i * 0.01 + i * 0.001 + i * 0.0001 + i * 0.000001)
a = []
for i in range(5000000):
vex = Demo(ar[i], ar[2])
a.append(vex)
print "Heap at the end of the functionn", hp.heap()
Here is the output:
Heap at the end of the functionn Partition of a set of 15063247 objects. Total size = 650251664 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 5000000 33 320000000 49 320000000 49 __main__.Demo
1 10000108 66 240002592 37 560002592 86 float
2 368 0 42008896 6 602011488 93 list
3 2 0 40000112 6 642011600 99 array.array
4 28182 0 2214784 0 644226384 99 str
5 12741 0 1058448 0 645284832 99 tuple
6 189 0 669624 0 645954456 99 dict of module
7 371 0 588104 0 646542560 99 dict (no owner)
8 258 0 509232 0 647051792 100 dict of sip.wrappertype
9 3176 0 406528 0 647458320 100 types.CodeType
I am wondering why the Demo class cost so much memory. Because Demo class just keeps a reference for the float, it doesn't copy the float value.
getSizeOf(Demo) # 984
50W of Demo class maybe just cost memory: 984*50W=40215176
but, now costs 320000000
.
It is unbelievable, why?
sys.getsizeof()
doesn't recurse into sub-objects, and you only took the size of the class, not of an instance. Each instance takes up 64 bytes, plus 24 bytes per float
object (on OS X, using Python 2.7.12):
>>> d = Demo(1.0, 2.0)
>>> sys.getsizeof(d)
64
>>> sys.getsizeof(d.v0)
24
>>> sys.getsizeof(d) + sys.getsizeof(d.v0) + sys.getsizeof(d.v1)
112
Each slot only reserves memory for a pointer in the instance object; on my machine that's 8 bytes per pointer.
There are several differences between your Demo()
instances and the array:
- Instances have a minimal overhead to support reference counting and weak references, as well as contain a pointer to their class. The arrays store the values directly, without any of that overhead.
- The instance stores Python floats. These are full-fledged objects, including reference counting and weak reference support. The array stores single precision floats as C values, while the Python
float
object models double precision floats. So the instance uses 2 * 24 bytes (on my Mac) just for those floats, vs. just 4 bytes per single-precision'f'
value in an array. - To track 5 million
Demo
instances, you also needed to create alist
object, which is sized to handle at least 5 million object references. Thearray
stores the C single-precision floats directly.
The hp.heap()
output only counts the instance footprint, not the referenced float
values on each line, but the totals match up:
- 5 million times 64 bytes is 320.000.000 bytes of memory for the
Demo
instances. - 10 million times 24 bytes is 240.000.000 bytes of memory for the
float
instances, plus a further 108 floats referenced elsewhere.
Together, these two groups make up the majority of the 15 million Python objects on the heap.
- The
list
object you created to hold the instances contains 5 million pointers, that's 40.000.000 bytes just to point to all theDemo
instances, plus the accounting overhead for that object. There are a further 367 lists on the heap, referenced by other Python code. - 2
array
instances with each 5 million 4-byte floats is 40.000.000 bytes, plus 56 bytes per array overhead.
So array
objects are vastly more efficient to store a large number of numeric values, because it stores these as primitive C values. However, the disadvantage is that Python has to box each value you try to access; so accessing ar[10]
returns a Python float
object.
这篇关于为什么我的类需要这么多内存?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!