Python C API:将PyObjects分配给字典会导致内存泄漏 [英] Python C API: Assigning PyObjects to a dictionary causes memory leak
问题描述
我正在使用Python C API为Python编写C ++包装器.就我而言,我必须使Python脚本可以访问更多的面向字节的数据.为此,我使用PyByteArray_FromStringAndSize
方法生成Python字节数组( https: //docs.python.org/2.7/c-api/bytearray.html ).
I am writing a C++ wrapper for Python using the Python C API. In my case I have to make bigger amounts of byte oriented data accessible for the Python script. For this purpose I use the PyByteArray_FromStringAndSize
method to produce a Python bytearray (https://docs.python.org/2.7/c-api/bytearray.html).
当直接返回此字节数组时,我没有遇到任何问题.但是,当将字节数组添加到Python字典中时,一旦销毁字典,就不会释放字节数组中的内存.
When returning this bytearray directly I have not experienced any problems. When however adding the bytearray into a Python dict, the memory from the bytearray will not be released once the dict is destroyed.
这可以通过在将bytearray对象添加到Python字典后在bytearray对象上调用Py_DECREF
来解决.
This can be solved by calling Py_DECREF
on the bytearray object after adding the bytearray object to the Python dict.
下面是我的代码的完整工作示例,其中包含方法dummyArrPlain
返回普通字节数组,方法dummyArrInDict
返回dict中的字节数组.除非调用Py_DECREF(pyData);
,否则第二种方法将产生内存泄漏.
Below is a complete working example of my code containing a method dummyArrPlain
returning the plain bytearray and a method dummyArrInDict
returning a bytearray in a dict. The second method will produce a memory leak unless Py_DECREF(pyData);
is called.
我的问题是:为什么此时需要Py_DECREF
.凭直觉,我希望一旦dict被销毁,就应该调用Py_DECREF
.
My question is: Why is Py_DECREF
necessary at this point. Intuitively I would have expected that Py_DECREF
should be called once the dict is destroyed.
我还为字典分配了如下所示的值:
Also I assign values like in the following to a dict:
PyDict_SetItem(dict, PyString_FromString("i"), PyInt_FromLong(i));
在未对创建的字符串长调用Py_DECREF
时,这还会导致内存泄漏吗?
Will this also produce a memory leak when not calling Py_DECREF
on the created string and long?
这是我的虚拟C ++包装器:
This is my dummy C++ wrapper:
#include <python2.7/Python.h>
static char module_docstring[] = "This is a module causing a memory leak";
static PyObject *dummyArrPlain(PyObject *self, PyObject *args);
static PyObject *dummyArrInDict(PyObject *self, PyObject *args);
static PyMethodDef module_methods[] = {
{"dummy_arr_plain", dummyArrPlain, METH_VARARGS, "returns a plain dummy bytearray"},
{"dummy_arr_in_dict", dummyArrInDict, METH_VARARGS, "returns a dummy bytearray in a dict"},
{NULL, NULL, 0, NULL}
};
PyMODINIT_FUNC initlibdummy(void)
{
PyObject *m = Py_InitModule("libdummy", module_methods);
if (m == NULL)
return;
}
static PyObject *dummyArrPlain(PyObject *self, PyObject *args)
{
int len = 10000000;
char* data = new char[len];
for(int i=0; i<len; i++) {
data[i] = 0;
}
PyObject * pyData = PyByteArray_FromStringAndSize(data, len);
delete [] data;
return pyData;
}
static PyObject *dummyArrInDict(PyObject *self, PyObject *args)
{
int len = 10000000;
char* data = new char[len];
for(int i=0; i<len; i++) {
data[i] = 0;
}
PyObject * pyData = PyByteArray_FromStringAndSize(data, len);
delete [] data;
PyObject *dict = PyDict_New();
PyDict_SetItem(dict, PyString_FromString("data"), pyData);
// memory leak without Py_DECREF(pyData);
return dict;
}
还有一个使用包装程序的虚拟python脚本:
And a dummy python script using the wrapper:
import libdummy
import time
while True:
a = libdummy.dummy_arr_in_dict()
time.sleep(0.01)
推荐答案
这是 [Python 2.0.Docs]:所有权规则.我将在 Python 2.7.10 上进行示例(相当老,但我认为行为没有(显着)改变).
It's a matter of [Python 2.0.Docs]: Ownership rules. I'm going to exemplify on Python 2.7.10 (pretty old, but I don't think that the behavior has (significantly) changed along the way).
PyByteArray_FromStringAndSize ( bytearrayobject.c : 168 )创建一个新对象(使用 PyObject_New )并分配内存缓冲区).
PyByteArray_FromStringAndSize (bytearrayobject.c: 168) creates a new object (using PyObject_New, and allocates memory for the buffer as well).
默认情况下,该对象(或更好的是,任何新创建的对象)的 refcount 为 1 (由 _Py_NewReference 设置),因此,当用户在其上调用 del 时或在程序退出时, refcount 将减小,而当其达到0时,该对象将被释放.
By default, the refcount of that object (or better: of any newly created object) is 1 (set by _Py_NewReference), so that when the user calls del upon it, or at program exit, the refcount will be decreased, and when reaching 0, the object will be deallocated.
-
这是返回对象的流程上的行为
This is the behavior on the flow where the object is returned
但是,在 dummyArrInDict 的情况下, PyDict_SetItem 会(间接)生成 pyData Py_INCREF >(它还有其他功能,但在当前情况下仅与此相关),最后以 2 的 refcount 结束,因此发生了内存泄漏
But, in dummyArrInDict's case, PyDict_SetItem does (indirectly) a Py_INCREF of pyData (it does other stuff, but only this is relevant in the current situation), ending up with a refcount of 2 and therefore the memory leak
使用 data 基本上是一样的事情:为它分配内存,当不再需要它时,就释放它(这是因为您没有返回它,您只能暂时使用它.)
It's basically same thing that you're doing with data: you allocate memory for it, and when you no longer need it, you free it (this is because you're not returning it, you only use it temporarily).
Note: It's safer to use the X macros (e.g. [Python 2.Docs]: Py_XDECREF, especially since you're not testing for NULL the returned PyObjects).
有关更多详细信息,还请参见 [Python 2.Docs] :C API参考.
For more details, also take a look at [Python 2.Docs]: C API Reference.
这篇关于Python C API:将PyObjects分配给字典会导致内存泄漏的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!