小型/大型numpy数组的释放处理方式有所不同吗? [英] Is freeing handled differently for small/large numpy arrays?

查看:477
本文介绍了小型/大型numpy数组的释放处理方式有所不同吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用大型Python应用程序调试内存问题.大部分内存都在Python类管理的numpy数组中,因此 Heapy 等无用,因为它们无法解决用于numpy数组中的内存.因此,我尝试使用MacOSX(10.7.5)活动监视器(或top,如果愿意)来手动跟踪内存使用情况.我注意到以下奇怪的行为.在普通的python解释器外壳(2.7.3)上:

I am trying to debug a memory problem with my large Python application. Most of the memory is in numpy arrays managed by Python classes, so Heapy etc. are useless, since they do not account for the memory in the numpy arrays. So I tried to manually track the memory usage using the MacOSX (10.7.5) Activity Monitor (or top if you will). I noticed the following weird behavior. On a normal python interpreter shell (2.7.3):

import numpy as np # 1.7.1
# Activity Monitor: 12.8 MB
a = np.zeros((1000, 1000, 17)) # a "large" array
# 142.5 MB
del a
# 12.8 MB (so far so good, the array got freed)
a = np.zeros((1000, 1000, 16)) # a "small" array
# 134.9 MB
del a
# 134.9 MB (the system didn't get back the memory)
import gc
gc.collect()
# 134.9 MB

无论我做什么,Python会话的内存占用都不会再低于134.9 MB.所以我的问题是:

No matter what I do, the memory footprint of the Python session will never go below 134.9 MB again. So my question is:

为什么大于1000x1000x17x8字节(根据经验在我的系统上找到)的数组资源正确地分配回系统,而较小数组的内存似乎永远被Python解释器所卡住了?

Why are the resources of arrays larger than 1000x1000x17x8 bytes (found empirically on my system) properly given back to the system, while the memory of smaller arrays appears to be stuck with the Python interpreter forever?

这似乎确实增加了,因为在我的实际应用程序中,我最终获得了超过2 GB的内存,而我再也无法从Python解释器中找回内存. Python会根据使用历史记录来保留越来越多的内存吗?如果是的话,那么对于我的案例,Activity Monitor就像Heapy一样毫无用处.那里有没有没有用的东西吗?

This does appear to ratchet up, since in my real-world applications, I end up with over 2 GB of memory I can never get back from the Python interpreter. Is this intended behavior that Python reserves more and more memory depending on usage history? If yes, then Activity Monitor is just as useless as Heapy for my case. Is there anything out there that is not useless?

推荐答案

Reading from Numpy's policy for releasing memory it seems like numpy does not have any special handling of memory allocation/deallocation. It simply calls free() when the reference count goes to zero. In fact it's pretty easy to replicate the issue with any built-in python object. The problem lies at the OS level.

纳撒尼尔·史密斯(Nathaniel Smith)在链接线程的其中一项答复中写了一个解释:

Nathaniel Smith has written an explanation of what is happening in one of his replies in the linked thread:

通常,进程可以从OS请求内存,但是它们不能 退还.在C级别,如果您调用free(),那么实际上 发生的原因是您进程中的内存管理库使 请自己注意,该内存未使用,可能会从 未来的malloc(),但从操作系统的角度来看,它仍然是 已分配". (并且python在顶部使用另一个类似的系统 malloc()/free(),但这并没有真正改变任何东西.)因此,操作系统 您看到的内存使用情况通常是高水位线",最大 您的进程曾经需要的内存量.

In general, processes can request memory from the OS, but they cannot give it back. At the C level, if you call free(), then what actually happens is that the memory management library in your process makes a note for itself that that memory is not used, and may return it from a future malloc(), but from the OS's point of view it is still "allocated". (And python uses another similar system on top for malloc()/free(), but this doesn't really change anything.) So the OS memory usage you see is generally a "high water mark", the maximum amount of memory that your process ever needed.

例外情况是大型单一分配(例如,如果您创建 多兆字节数组),则使用其他机制.这么大 内存分配 可以释放回操作系统.所以可能 特别是正在产生的程序的非numpy部分 您看到的问题.

The exception is that for large single allocations (e.g. if you create a multi-megabyte array), a different mechanism is used. Such large memory allocations can be released back to the OS. So it might specifically be the non-numpy parts of your program that are producing the issues you see.

因此,似乎没有通用的解决方案.分配许多小对象将导致工具所描述的高内存使用率",即使您将在需要时将其重用,而不会分配大对象释放后显示大的内存使用情况,因为操作系统已回收内存.

So, it seems like there is no general solution to the problem .Allocating many small objects will lead to a "high memory usage" as profiled by the tools, even thou it will be reused when needed, while allocating big objects wont show big memory usage after deallocation because memory is reclaimed by the OS.

您可以验证此分配的内置python对象:

You can verify this allocating built-in python objects:

In [1]: a = [[0] * 100 for _ in range(1000000)]

In [2]: del a

这段代码之后,我可以看到在执行以下操作时, 未被回收:

After this code I can see that memory is not reclaimed, while doing:

In [1]: a = [[0] * 10000 for _ in range(10000)]

In [2]: del a

回收内存 .

为避免内存问题,您应该分配大数组并使用它们(也许使用视图模拟"小数组?),或者尝试避免同时有多个小数组 .如果您有一些创建小对象的循环,则可以显式地取消分配每次迭代不需要的对象,而不是仅在最后执行此操作.

To avoid memory problems you should either allocate big arrays and work with them(maybe use views to "simulate" small arrays?), or try to avoid having many small arrays at the same time. If you have some loop that creates small objects you might explicitly deallocate objects not needed at every iteration instead of doing this only at the end.

我相信 Python内存管理对如何在python中管理内存提供了很好的见解.请注意,除了操作系统问题"之外,python还添加了另一层来管理内存区域,这可能会导致小对象占用大量内存.

I believe Python Memory Management gives good insights on how memory is managed in python. Note that, on top of the "OS problem", python adds another layer to manage memory arenas, which can contribute to high memory usage with small objects.

这篇关于小型/大型numpy数组的释放处理方式有所不同吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆