Python:无法复制对内存使用情况的测试 [英] Python: Cannot replicate a test on memory usage

查看:104
本文介绍了Python:无法复制对内存使用情况的测试的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正尝试在此处复制内存使用情况测试>.

I was trying to replicate the memory usage test here.

从本质上讲,该帖子声称给出了以下代码片段:

Essentially, the post claims that given the following code snippet:

import copy
import memory_profiler

@profile
def function():
    x = list(range(1000000))  # allocate a big list
    y = copy.deepcopy(x)
    del x
    return y

if __name__ == "__main__":
    function()

调用

python -m memory_profiler memory-profile-me.py

在64位计算机上打印

Filename: memory-profile-me.py

Line #    Mem usage    Increment   Line Contents
================================================
 4                             @profile
 5      9.11 MB      0.00 MB   def function():
 6     40.05 MB     30.94 MB       x = list(range(1000000)) # allocate a big list
 7     89.73 MB     49.68 MB       y = copy.deepcopy(x)
 8     82.10 MB     -7.63 MB       del x
 9     82.10 MB      0.00 MB       return y

我复制并粘贴了相同的代码,但分析器产生了

I copied and pasted the same code but my profiler yields

Line #    Mem usage    Increment   Line Contents
================================================
 3   44.711 MiB   44.711 MiB   @profile
 4                             def function():
 5   83.309 MiB   38.598 MiB       x = list(range(1000000))  # allocate a big list
 6   90.793 MiB    7.484 MiB       y = copy.deepcopy(x)
 7   90.793 MiB    0.000 MiB       del x
 8   90.793 MiB    0.000 MiB       return y

此帖子可能已过时---探查程序包或python可能已更改.无论如何,我的问题是在Python 3.6.x中

This post could be outdated --- either the profiler package or python could have changed. In any case, my questions are, in Python 3.6.x

(1)copy.deepcopy(x)(在上面的代码中定义)是否应该消耗大量内存?

(1) Should copy.deepcopy(x) (as defined in the code above) consume a nontrivial amount of memory?

(2)为什么我不能复制?

(2) Why couldn't I replicate?

(3)如果我在del x之后重复x = list(range(1000000)),那么内存增加的数量是否与我第一次分配x = list(range(1000000))的数量相同(如我的代码的第5行)?

(3) If I repeat x = list(range(1000000)) after del x, would the memory increase by the same amount as I first assigned x = list(range(1000000)) (as in line 5 of my code)?

推荐答案

copy.deepcopy()仅递归地复制可变对象,不复制诸如整数或字符串之类的不可变对象.要复制的列表由不可变的整数组成,因此y副本最终共享对相同整数值的引用:

copy.deepcopy() recursively copies mutable object only, immutable objects such as integers or strings are not copied. The list being copied consists of immutable integers, so the y copy ends up sharing references to the same integer values:

>>> import copy
>>> x = list(range(1000000))
>>> y = copy.deepcopy(x)
>>> x[-1] is y[-1]
True
>>> all(xv is yv for xv, yv in zip(x, y))
True

因此,副本仅需要创建一个具有100万个引用的新列表对象,该对象在Mac OS X 10.13(64位OS)上构建的Python 3.6上占用的内存超过8MB:

So the copy only needs to create a new list object with 1 million references, an object that takes a little over 8MB of memory on my Python 3.6 build on Mac OS X 10.13 (a 64-bit OS):

>>> import sys
>>> sys.getsizeof(y)
8697464
>>> sys.getsizeof(y) / 2 ** 20   # Mb
8.294548034667969

一个空的list对象占用64个字节,每个引用占用8个字节:

An empty list object takes 64 bytes, each reference takes 8 bytes:

>>> sys.getsizeof([])
64
>>> sys.getsizeof([None])
72

Python列表对象总体分配空间来增长,将range()对象转换为列表会导致它为使用其他增长而留出的空间比使用deepcopy时要多一些,因此x仍然稍大一些,有空间在必须再次调整大小之前,又增加了125k对象:

Python list objects overallocate space to grow, converting a range() object to a list causes it to make a little more space for additional growth than when using deepcopy, so x is slightly larger still, having room for an additional 125k objects before having to resize again:

>>> sys.getsizeof(x)
9000112
>>> sys.getsizeof(x) / 2 ** 20
8.583175659179688
>>> ((sys.getsizeof(x) - 64) // 8) - 10**6
125006

而副本仅剩大约87k的剩余空间:

while the copy only has additional space for left for about 87k:

>>> ((sys.getsizeof(y) - 64) // 8) - 10**6
87175

在Python 3.6上,我也不能重复这篇文章的声明,部分是因为Python看到了很多内存管理方面的改进,部分是因为本文在某些方面是错误的.

On Python 3.6 I can't replicate the article claims either, in part because Python has seen a lot of memory management improvements, and in part because the article is wrong on several points.

copy.deepcopy()关于列表和整数的行为在copy.deepcopy()的悠久历史中从未改变过(请参见

The behaviour of copy.deepcopy() regarding lists and integers has never changed in the long history of the copy.deepcopy() (see the first revision of the module, added in 1995), and the interpretation of the memory figures is wrong, even on Python 2.7.

具体地说,我可以使用Python 2.7再现结果,这是我在计算机上看到的:

Specifically, I can reproduce the results using Python 2.7 This is what I see on my machine:

$ python -V
Python 2.7.15
$ python -m memory_profiler memtest.py
Filename: memtest.py

Line #    Mem usage    Increment   Line Contents
================================================
     4   28.406 MiB   28.406 MiB   @profile
     5                             def function():
     6   67.121 MiB   38.715 MiB       x = list(range(1000000))  # allocate a big list
     7  159.918 MiB   92.797 MiB       y = copy.deepcopy(x)
     8  159.918 MiB    0.000 MiB       del x
     9  159.918 MiB    0.000 MiB       return y

正在发生的事情是,Python的内存管理系统正在分配新的内存块以进行额外的扩展.并不是新的y列表对象占用了将近93MiB的内存,而仅仅是OS分配给Python进程的额外内存,当该进程为对象堆请求更多的内存时.列表对象本身要小很多.

What is happening is that Python's memory management system is allocating a new chunk of memory for additional expansion. It's not that the new y list object takes nearly 93MiB of memory, that's just the additional memory the OS has allocated to the Python process when that process requested some more memory for the object heap. The list object itself is a lot smaller.

Python 3 tracemalloc模块的准确性要高得多实际发生了什么:

The Python 3 tracemalloc module is a lot more accurate about what actually happens:

python3 -m memory_profiler --backend tracemalloc memtest.py
Filename: memtest.py

Line #    Mem usage    Increment   Line Contents
================================================
     4    0.001 MiB    0.001 MiB   @profile
     5                             def function():
     6   35.280 MiB   35.279 MiB       x = list(range(1000000))  # allocate a big list
     7   35.281 MiB    0.001 MiB       y = copy.deepcopy(x)
     8   26.698 MiB   -8.583 MiB       del x
     9   26.698 MiB    0.000 MiB       return y

Python 3.x内存管理器和列表实现比2.7中的智能;显然,新列表对象能够放入创建x时预先分配的现有已有内存中.

The Python 3.x memory manager and list implementation is smarter than those one in 2.7; evidently the new list object was able to fit into existing already-available memory, pre-allocated when creating x.

我们可以使用手动构建的Python 2.7.12 tracemalloc来测试Python 2.7的行为二进制文件memory_profile.py小补丁.现在,我们在Python 2.7上也获得了更多令人放心的结果:

We can test Python 2.7's behaviour with a manually built Python 2.7.12 tracemalloc binary and a small patch to memory_profile.py. Now we get more reassuring results on Python 2.7 as well:

Filename: memtest.py

Line #    Mem usage    Increment   Line Contents
================================================
     4    0.099 MiB    0.099 MiB   @profile
     5                             def function():
     6   31.734 MiB   31.635 MiB       x = list(range(1000000))  # allocate a big list
     7   31.726 MiB   -0.008 MiB       y = copy.deepcopy(x)
     8   23.143 MiB   -8.583 MiB       del x
     9   23.141 MiB   -0.002 MiB       return y

我注意到作者也很困惑:

I note that the author was confused as well:

copy.deepcopy复制两个列表,然后再次分配〜50 MB(我不确定50 MB-31 MB = 19 MB的额外开销来自何处)

copy.deepcopy copies both lists, which allocates again ~50 MB (I am not sure where the additional overhead of 50 MB - 31 MB = 19 MB comes from)

(加粗强调).

这里的错误是假定Python进程大小中的所有内存更改都可以直接归因于特定对象,但是实际情况要复杂得多,因为内存管理器可以添加(并删除!)内存"arenas",这是为堆保留的内存块,如果需要的话,将在更大的块中这样做.这里的过程很复杂,因为它取决于 Python的管理器与操作系统malloc实施细节之间的相互作用.作者发现了一篇关于Python模型的较旧文章,他们误解为最新文章,

The error here is to assume that all memory changes in the Python process size can directly be attributed to specific objects, but the reality is far more complex, as the memory manager can add (and remove!) memory 'arenas', blocks of memory reserved for the heap, as needed and will do so in larger blocks if that makes sense. The process here is complex, as it depends on interactions between Python's manager and the OS malloc implementation details. The author has found an older article on Python's model that they have misunderstood to be current, the author of that article themselves has already tried to point this out; as of Python 2.5 the claim that Python doesn't free memory is no longer true.

令人困扰的是,同样的误解导致作者建议不要使用pickle,但是实际上,即使在Python 2上,该模块也从来没有添加过多的簿记内存来跟踪递归结构.请参见我的测试方法要点;在Python 2.7上使用cPickle会导致46MiB一次性增加(将create_file()调用加倍不会导致内存进一步增加).在Python 3中,内存更改完全消失了.

What's troubling, is that the same misunderstandings then lead the author to recommend against using pickle, but in reality the module, even on Python 2, never adds more than a little bookkeeping memory to track recursive structures. See this gist for my testing methodology; using cPickle on Python 2.7 adds a one-time 46MiB increase (doubling the create_file() call results in no further memory increase). In Python 3, the memory changes have gone altogether.

我将与Theano团队就该帖子打开一个对话框,该文章是错误的,令人困惑的,并且Python 2.7很快将被完全淘汰,因此他们确实应该专注于Python 3的内存模型. (*)

I'll open a dialog with the Theano team about the post, the article is wrong, confusing, and Python 2.7 is soon to be made entirely obsolete anyway so they really should focus on Python 3's memory model. (*)

当您从range()创建一个新列表而不是一个副本时,您会看到与第一次创建x类似的内存增加,因为您将除了新的列表对象之外,还提供了一组新的整数对象.除了一组特定的小整数之外,Python不会缓存并重新-将整数值用于range()操作.

When you create a new list from range(), not a copy, you'll see a similar increase in memory as for creating x the first time, because you'd create a new set of integer objects in addition to the new list object. Aside from a specific set of small integers, Python doesn't cache and re-use integer values for range() operations.

(*) 附录:我打开了从他们的文档中删除了该页面,尽管他们还没有尚未更新发布的版本.

(*) addendum: I opened issue #6619 with the Thano project. The project agreed with my assessment and removed the page from their documentation, although they haven't yet updated the published version.

这篇关于Python:无法复制对内存使用情况的测试的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆