如何取消分配类型化的numpy数组?设置callback_free_data是可行的选择吗? [英] How to deallocate a typed numpy array? Is setting callback_free_data a viable option?

查看:163
本文介绍了如何取消分配类型化的numpy数组?设置callback_free_data是可行的选择吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在使用开源Cython库时,我发现内存泄漏.泄漏似乎来自类型化的numpy数组,当超出范围时,该数组不会从内存中释放出来.声明如下:

While using an open source Cython library I found a memory leak. The leak seems to come from a typed numpy array, which is not freed from the memory when it goes out of scope. The declaration is the following:

cdef np.ndarray[object, ndim=1] my_array = np.empty(my_size, dtype=object)

据我所知,垃圾收集器应该像对待其他任何numpy数组一样考虑这一点,并且一旦数组超出范围,GC应该释放其内存-在这种情况下,它将在函数的末尾释放被宣布.显然这不会发生.

In my understanding, this should be considered by the garbage collector like any other numpy array and the GC should free its memory as soon as the array goes out of scope -- in this case at the end of the function in which it is declared. Apparently this does not happen.

如果该数组是先使用cython数组创建的,然后将其转换为numpy数组,则可以使用

If the array were created using a cython array first, and then casting it to numpy array, one could use the callback_free_data function like described here and here. However, in this case it is not possible to reach the pointers of my_array and it is not possible to set the callback.

关于这种声明为何会导致内存泄漏和/或如何强制释放的任何想法?

Any idea on why this kind of declaration could cause a memory leak and/or how to force the deallocation?

更新:

我的问题很笼统,我想避免发布代码,因为它有点复杂,但是由于有人问我们,我们去了:

My question was very generic, and I wanted to avoid posting the code because it is a bit intricate, but since someone asked here we go:

cdef dijkstra(Graph G, int start_idx, int end_idx):

    # Some code

    cdef np.ndarray[object, ndim=1] fiboheap_nodes = np.empty([G.num_nodes], dtype=object) # holds all of our FiboHeap Nodes Pointers

    Q = FiboHeap()

    fiboheap_nodes[start_idx] = Q.insert(0, start_idx)

    # Some other code where it could perform operations like:
    # Q.decrease_key(fiboheap_nodes[w], vw_distance)

    # End of operations

    # do we need to cleanup the fiboheap_nodes array here?

    return

FiboHeap是c实现的Cython包装器.例如,insert函数如下所示:

The FiboHeap is a Cython wrapper for the c implementation. For example, the insert function looks like this:

cimport cfiboheap
from cpython.pycapsule cimport PyCapsule_New, PyCapsule_GetPointer
from python_ref cimport Py_INCREF, Py_DECREF 

cdef inline object convert_fibheap_el_to_pycapsule(cfiboheap.fibheap_el* element):
    return PyCapsule_New(element, NULL, NULL)

cdef class FiboHeap:

    def __cinit__(FiboHeap self):
        self.treeptr = cfiboheap.fh_makekeyheap()
        if self.treeptr is NULL:
            raise MemoryError()

    def __dealloc__(FiboHeap self):
        if self.treeptr is not NULL:
            cfiboheap.fh_deleteheap(self.treeptr)

    cpdef object insert(FiboHeap self, double key, object data=None):
        Py_INCREF(data)
        cdef cfiboheap.fibheap_el* retValue = cfiboheap.fh_insertkey(self.treeptr, key, <void*>data)
        if retValue is NULL:
            raise MemoryError()

        return convert_fibheap_el_to_pycapsule(retValue)

__dealloc__()函数按照预期的方式工作,因此FiboHeap在函数dijkstra(...)的末尾从内存中释放.我的猜测是fiboheap_nodes中包含的指针出了一些问题. 有任何猜想吗?

The __dealloc__() function works as it is supposed to, so the FiboHeap is released from the memory at the end of the function dijkstra(...). My guess is that something is going wrong with the pointers contained in fiboheap_nodes. Any guess?

推荐答案

问题(已在注释中解决)原来不是numpy数组的释放.相反,numpy数组包含一堆Fiboheap对象,它们本身持有指向一堆Python对象的指针.这些对象没有被释放.

The problem (solved in the comments) turned out not to be the deallocation of the numpy array. Instead, the numpy array held a bunch of Fiboheap objects, which themselves held pointers to a bunch of Python objects. It's these objects that weren't freed.

(在insert中)获取了Fiboheap中的Python对象指针时,其引用计数将增加,以确保它们保持活动状态.但是,当Fiboheap被销毁时(在__dealloc__中),其持有的Python对象的引用计数并未减少,从而导致内存泄漏.解决方案是确保在__dealloc__期间对所有保留的Python对象调用Py_DECREF.

When the Python object pointers in the Fiboheap were acquired (in insert) their reference count was incremented to ensure they were kept alive. However, when the Fiboheap was destroyed (in __dealloc__) the reference count of the Python objects it held was not decreased, causing the memory leak. The solution is to ensure that Py_DECREF is called on all the held Python objects during __dealloc__.

可能还有第二个更具挑战性的问题等待出现:Fiboheap持有的对象本身可能包含对Fiboheap的引用,这可能是间接的. Python使用函数 tp_tranverse 查找这些循环和tp_clear中断它们. Cython会为其cdef类自动生成一个tp_traverse,但是由于它无法了解隐藏在C Fiboheap结构中的Python对象指针,因此无法正确处理它们(可能会产生另一个内存泄漏)

There's potentially a second, more challenging problem waiting to appear: it might be possible for the objects held by the Fiboheap to themselves contain a reference back to the Fiboheap, maybe indirectly. Python uses the function tp_tranverse to find these loops and tp_clear to break them. Cython will automatically generate a tp_traverse for its cdef classes, however since it has no way of knowing about the Python object pointers hidden within the C Fiboheap structure it won't handle these correctly (maybe generating another memory leak).

这在现实中可能不太可能发生,因此可能不值得担心,但这是需要注意的事情. 新闻组帖子描述了一种生成自定义tp_traverse函数的方法在Cython.对于大多数应用程序,这不是必须的-仅仅是Cython objectPyObject*的混合物才使得在这里稍有可能.

This is a probably unlikely to happen in reality, so may not be worth worrying about, but it's something to be aware of. A newsgroup post describes a means of generating custom tp_traverse functions in Cython. For most applications this should not be necessary - it's only the mixture of Cython object and PyObject* that makes it slightly possible here.

这篇关于如何取消分配类型化的numpy数组?设置callback_free_data是可行的选择吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆