如何取消分配类型化的numpy数组?设置callback_free_data是可行的选择吗? [英] How to deallocate a typed numpy array? Is setting callback_free_data a viable option?
问题描述
在使用开源Cython库时,我发现内存泄漏.泄漏似乎来自类型化的numpy数组,当超出范围时,该数组不会从内存中释放出来.声明如下:
While using an open source Cython library I found a memory leak. The leak seems to come from a typed numpy array, which is not freed from the memory when it goes out of scope. The declaration is the following:
cdef np.ndarray[object, ndim=1] my_array = np.empty(my_size, dtype=object)
据我所知,垃圾收集器应该像对待其他任何numpy数组一样考虑这一点,并且一旦数组超出范围,GC应该释放其内存-在这种情况下,它将在函数的末尾释放被宣布.显然这不会发生.
In my understanding, this should be considered by the garbage collector like any other numpy array and the GC should free its memory as soon as the array goes out of scope -- in this case at the end of the function in which it is declared. Apparently this does not happen.
如果该数组是先使用cython数组创建的,然后将其转换为numpy数组,则可以使用这里.但是,在这种情况下,无法到达my_array
的指针,并且无法设置回调.
If the array were created using a cython array first, and then casting it to numpy array, one could use the callback_free_data function like described here and here. However, in this case it is not possible to reach the pointers of my_array
and it is not possible to set the callback.
关于这种声明为何会导致内存泄漏和/或如何强制释放的任何想法?
Any idea on why this kind of declaration could cause a memory leak and/or how to force the deallocation?
更新:
我的问题很笼统,我想避免发布代码,因为它有点复杂,但是由于有人问我们,我们去了:
My question was very generic, and I wanted to avoid posting the code because it is a bit intricate, but since someone asked here we go:
cdef dijkstra(Graph G, int start_idx, int end_idx):
# Some code
cdef np.ndarray[object, ndim=1] fiboheap_nodes = np.empty([G.num_nodes], dtype=object) # holds all of our FiboHeap Nodes Pointers
Q = FiboHeap()
fiboheap_nodes[start_idx] = Q.insert(0, start_idx)
# Some other code where it could perform operations like:
# Q.decrease_key(fiboheap_nodes[w], vw_distance)
# End of operations
# do we need to cleanup the fiboheap_nodes array here?
return
FiboHeap
是c实现的Cython包装器.例如,insert函数如下所示:
The FiboHeap
is a Cython wrapper for the c implementation. For example, the insert function looks like this:
cimport cfiboheap
from cpython.pycapsule cimport PyCapsule_New, PyCapsule_GetPointer
from python_ref cimport Py_INCREF, Py_DECREF
cdef inline object convert_fibheap_el_to_pycapsule(cfiboheap.fibheap_el* element):
return PyCapsule_New(element, NULL, NULL)
cdef class FiboHeap:
def __cinit__(FiboHeap self):
self.treeptr = cfiboheap.fh_makekeyheap()
if self.treeptr is NULL:
raise MemoryError()
def __dealloc__(FiboHeap self):
if self.treeptr is not NULL:
cfiboheap.fh_deleteheap(self.treeptr)
cpdef object insert(FiboHeap self, double key, object data=None):
Py_INCREF(data)
cdef cfiboheap.fibheap_el* retValue = cfiboheap.fh_insertkey(self.treeptr, key, <void*>data)
if retValue is NULL:
raise MemoryError()
return convert_fibheap_el_to_pycapsule(retValue)
__dealloc__()
函数按照预期的方式工作,因此FiboHeap在函数dijkstra(...)
的末尾从内存中释放.我的猜测是fiboheap_nodes中包含的指针出了一些问题.
有任何猜想吗?
The __dealloc__()
function works as it is supposed to, so the FiboHeap is released from the memory at the end of the function dijkstra(...)
. My guess is that something is going wrong with the pointers contained in fiboheap_nodes.
Any guess?
推荐答案
问题(已在注释中解决)原来不是numpy数组的释放.相反,numpy数组包含一堆Fiboheap
对象,它们本身持有指向一堆Python对象的指针.这些对象没有被释放.
The problem (solved in the comments) turned out not to be the deallocation of the numpy array. Instead, the numpy array held a bunch of Fiboheap
objects, which themselves held pointers to a bunch of Python objects. It's these objects that weren't freed.
(在insert
中)获取了Fiboheap
中的Python对象指针时,其引用计数将增加,以确保它们保持活动状态.但是,当Fiboheap
被销毁时(在__dealloc__
中),其持有的Python对象的引用计数并未减少,从而导致内存泄漏.解决方案是确保在__dealloc__
期间对所有保留的Python对象调用Py_DECREF
.
When the Python object pointers in the Fiboheap
were acquired (in insert
) their reference count was incremented to ensure they were kept alive. However, when the Fiboheap
was destroyed (in __dealloc__
) the reference count of the Python objects it held was not decreased, causing the memory leak. The solution is to ensure that Py_DECREF
is called on all the held Python objects during __dealloc__
.
可能还有第二个更具挑战性的问题等待出现:Fiboheap
持有的对象本身可能包含对Fiboheap
的引用,这可能是间接的. Python使用函数 tp_tranverse
查找这些循环和tp_clear
中断它们. Cython会为其cdef
类自动生成一个tp_traverse
,但是由于它无法了解隐藏在C Fiboheap
结构中的Python对象指针,因此无法正确处理它们(可能会产生另一个内存泄漏)
There's potentially a second, more challenging problem waiting to appear: it might be possible for the objects held by the Fiboheap
to themselves contain a reference back to the Fiboheap
, maybe indirectly. Python uses the function tp_tranverse
to find these loops and tp_clear
to break them. Cython will automatically generate a tp_traverse
for its cdef
classes, however since it has no way of knowing about the Python object pointers hidden within the C Fiboheap
structure it won't handle these correctly (maybe generating another memory leak).
这在现实中可能不太可能发生,因此可能不值得担心,但这是需要注意的事情. 新闻组帖子描述了一种生成自定义tp_traverse
函数的方法在Cython.对于大多数应用程序,这不是必须的-仅仅是Cython object
和PyObject*
的混合物才使得在这里稍有可能.
This is a probably unlikely to happen in reality, so may not be worth worrying about, but it's something to be aware of. A newsgroup post describes a means of generating custom tp_traverse
functions in Cython. For most applications this should not be necessary - it's only the mixture of Cython object
and PyObject*
that makes it slightly possible here.
这篇关于如何取消分配类型化的numpy数组?设置callback_free_data是可行的选择吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!