小/大 numpy 数组的释放处理是否不同? [英] Is freeing handled differently for small/large numpy arrays?

查看:24
本文介绍了小/大 numpy 数组的释放处理是否不同?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试调试大型 Python 应用程序的内存问题.大部分内存在 Python 类管理的 numpy 数组中,所以 Heapy 等都没用,因为它们没有考虑 numpy 数组中的内存.因此,我尝试使用 MacOSX (10.7.5) 活动监视器(或 top,如果您愿意)手动跟踪内存使用情况.我注意到以下奇怪的行为.在普通的 python 解释器 shell (2.7.3) 上:

I am trying to debug a memory problem with my large Python application. Most of the memory is in numpy arrays managed by Python classes, so Heapy etc. are useless, since they do not account for the memory in the numpy arrays. So I tried to manually track the memory usage using the MacOSX (10.7.5) Activity Monitor (or top if you will). I noticed the following weird behavior. On a normal python interpreter shell (2.7.3):

import numpy as np # 1.7.1
# Activity Monitor: 12.8 MB
a = np.zeros((1000, 1000, 17)) # a "large" array
# 142.5 MB
del a
# 12.8 MB (so far so good, the array got freed)
a = np.zeros((1000, 1000, 16)) # a "small" array
# 134.9 MB
del a
# 134.9 MB (the system didn't get back the memory)
import gc
gc.collect()
# 134.9 MB

无论我做什么,Python 会话的内存占用都不会再低于 134.9 MB.所以我的问题是:

No matter what I do, the memory footprint of the Python session will never go below 134.9 MB again. So my question is:

为什么大于 1000x1000x17x8 字节(在我的系统上凭经验发现)的数组资源会正确返回给系统,而较小数组的内存似乎永远被 Python 解释器卡住了?

Why are the resources of arrays larger than 1000x1000x17x8 bytes (found empirically on my system) properly given back to the system, while the memory of smaller arrays appears to be stuck with the Python interpreter forever?

这似乎越来越严重,因为在我的实际应用程序中,我最终拥有超过 2 GB 的内存,我永远无法从 Python 解释器中恢复.这是 Python 根据使用历史保留越来越多的内存的预期行为吗?如果是,那么 Activity Monitor 和 Heapy 一样没用.有什么东西不是没用的吗?

This does appear to ratchet up, since in my real-world applications, I end up with over 2 GB of memory I can never get back from the Python interpreter. Is this intended behavior that Python reserves more and more memory depending on usage history? If yes, then Activity Monitor is just as useless as Heapy for my case. Is there anything out there that is not useless?

推荐答案

Reading from Numpy 的内存释放策略 似乎 numpy 没有没有对内存分配/释放有任何特殊处理.它只是在引用计数变为零时调用 free().事实上,很容易用任何内置的python 对象复制这个问题.问题出在操作系统层面.

Reading from Numpy's policy for releasing memory it seems like numpy does not have any special handling of memory allocation/deallocation. It simply calls free() when the reference count goes to zero. In fact it's pretty easy to replicate the issue with any built-in python object. The problem lies at the OS level.

纳撒尼尔·史密斯 (Nathaniel Smith) 在链接线程中的一个回复中对正在发生的事情进行了解释:

Nathaniel Smith has written an explanation of what is happening in one of his replies in the linked thread:

一般来说,进程可以向操作系统请求内存,但他们不能还给.在 C 层面,如果你调用 free(),那么实际上是什么发生的事情是你的进程中的内存管理库做了一个请注意该内存未被使用,并且可能会从未来的 malloc(),但从操作系统的角度来看,它仍然是分配".(并且python在顶部使用另一个类似的系统malloc()/free(),但这并没有真正改变任何东西.)所以操作系统你看到的内存占用一般是高水位",最大您的进程需要的内存量.

In general, processes can request memory from the OS, but they cannot give it back. At the C level, if you call free(), then what actually happens is that the memory management library in your process makes a note for itself that that memory is not used, and may return it from a future malloc(), but from the OS's point of view it is still "allocated". (And python uses another similar system on top for malloc()/free(), but this doesn't really change anything.) So the OS memory usage you see is generally a "high water mark", the maximum amount of memory that your process ever needed.

例外情况是对于大的单一分配(例如,如果您创建一个多兆字节的数组),使用不同的机制.这么大内存分配可以释放回操作系统.所以它可能特别是程序中产生的非 numpy 部分您看到的问题.

The exception is that for large single allocations (e.g. if you create a multi-megabyte array), a different mechanism is used. Such large memory allocations can be released back to the OS. So it might specifically be the non-numpy parts of your program that are producing the issues you see.

所以,这个问题似乎没有通用的解决方案.分配许多小对象会导致工具描述的高内存使用率",即使它会在需要时被重用,而分配大对象不会显示释放后的大量内存使用,因为内存被操作系统回收.

So, it seems like there is no general solution to the problem .Allocating many small objects will lead to a "high memory usage" as profiled by the tools, even thou it will be reused when needed, while allocating big objects wont show big memory usage after deallocation because memory is reclaimed by the OS.

你可以验证这个分配内置的python对象:

You can verify this allocating built-in python objects:

In [1]: a = [[0] * 100 for _ in range(1000000)]

In [2]: del a

在这段代码之后,我可以看到内存没有被回收,同时执行:

After this code I can see that memory is not reclaimed, while doing:

In [1]: a = [[0] * 10000 for _ in range(10000)]

In [2]: del a

内存回收.

为了避免内存问题,您应该分配大数组并使用它们(也许使用视图来模拟"小数组?),或者尽量避免同时拥有许多小数组.如果您有一些创建小对象的循环,您可能会在每次迭代时明确释放不需要的对象,而不是只在最后这样做.

To avoid memory problems you should either allocate big arrays and work with them(maybe use views to "simulate" small arrays?), or try to avoid having many small arrays at the same time. If you have some loop that creates small objects you might explicitly deallocate objects not needed at every iteration instead of doing this only at the end.

我相信 Python 内存管理 深入了解 Python 中的内存管理方式.请注意,在操作系统问题"之上,python 添加了另一层来管理内存领域,这可能会导致小对象的内存使用率高.

I believe Python Memory Management gives good insights on how memory is managed in python. Note that, on top of the "OS problem", python adds another layer to manage memory arenas, which can contribute to high memory usage with small objects.

这篇关于小/大 numpy 数组的释放处理是否不同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆