使用外部 C DLL 时 Python 中的内存泄漏 [英] Memory leaks in Python when using an external C DLL
问题描述
我有一个 python 模块,它调用一个用 C 编写的 DLL 来编码 XML 字符串.一旦函数返回编码的字符串,它就无法取消分配在此步骤中分配的内存.具体:
I have a python module that calls a DLL written C to encode XML strings. Once the function returns the encoded string, it fails to de-allocate the memory which was allocated during this step. Concretely:
encodeMyString = ctypes.create_string_buffer(4096)
encodeMyString = ctypes.create_string_buffer(4096)
CallEncodingFuncInDLL(encodeMyString, InputXML)
CallEncodingFuncInDLL(encodeMyString, InputXML)
我看过这个,this 和 this 并尝试调用 gc.collect
但也许由于对象已在外部 DLL 中分配,python gc 没有任何它的记录并且无法删除它.但是由于代码不断调用编码函数,它不断分配内存,最终python进程崩溃.有没有办法分析这种内存使用情况?
I have looked at this, this, and this and have also tried calling the gc.collect
but perhaps since the object has been allocated in an external DLL, python gc doesn't have any record of it and fails to remove it. But since the code keeps calling the encoding function, it keeps on allocating memory and eventually the python process crashes. Is there a way to profile this memory usage?
推荐答案
由于您尚未提供有关 DLL 的任何信息,因此这一定会非常含糊,但是……
Since you haven't given any information about the DLL, this will necessarily be pretty vague, but…
Python 无法跟踪由它不知道的外部事物分配的内存.怎么可能?该内存可能是 DLL 的常量段的一部分,或者是用 mmap
或 VirtualAlloc
分配的,或者是更大对象的一部分,或者 DLL 可能只是期望它活着供自己使用.
Python can't track memory allocated by something external that it doesn't know about. How could it? That memory could be part of the DLL's constant segment, or allocated with mmap
or VirtualAlloc
, or part of a larger object, or the DLL could just be expecting it to be alive for its own use.
任何具有分配和返回新对象的函数的 DLL 都必须具有释放该对象的函数.例如,如果 CallEncodingFuncInDLL
返回一个您负责的新对象,则将有一个类似于 DestroyEncodedThingInDLL
的函数,它接受这样一个对象并释放它.
Any DLL that has a function that allocates and returns a new object has to have a function that deallocates that object. For example, if CallEncodingFuncInDLL
returns a new object that you're responsible for, there will be a function like DestroyEncodedThingInDLL
that takes such an object and deallocates it.
那么,你什么时候调用这个函数?
So, when do you call this function?
让我们退后一步,让这更具体.假设该函数是普通的 strdup
,因此您调用以释放内存的函数是 free
.您有两种选择何时调用 free
.不,我不知道你为什么要从 Python 调用 strdup
,但这是最简单的例子,所以让我们假装它没有用.
Let's step back and make this more concrete. Let's say the function is plain old strdup
, so the function you call to free up the memory is free
. You have two choices for when to call free
. No, I have no idea why you'd ever want to call strdup
from Python, but it's about the simplest possible example, so let's pretend it's not useless.
第一个选项是调用strdup
,立即将返回值转换为原生Python对象并释放它,之后就不用担心了:
The first option is to call strdup
, immediately convert the returned value to a native Python object and free it, and not have to worry about it after that:
newbuf = libc.strdup(mybuf)
s = newbuf.value
libc.free(newbuf)
# now use s, which is just a Python bytes object, so it's GC-able
或者,更好的是,使用自定义的restype
可调用:
Or, better, wrap this up so it's automatic by using a custom restype
callable:
def convert_and_free_char_p(char_p):
try:
return char_p.value
finally:
libc.free(char_p)
libc.strdup.restype = convert_and_free_char_p
s = libc.strdup(mybuf)
# now use s
<小时>
但是有些对象不能那么容易地转换为原生 Python 对象——或者它们可以,但是这样做并不是很有用,因为您需要不断将它们传递回 DLL.在这种情况下,在完成之前您无法清理它.
But some objects can't be converted to a native Python object so easily—or they can be, but it's not very useful to do so, because you need to keep passing them back into the DLL. In that case, you can't clean it up until you're done with it.
最好的方法是将该不透明值包装在一个类中,该类在 close
或 __exit__
或 __del__
或任何看起来合适的.一种很好的方法是使用 @contextmanager
:
The best way to do this is to wrap that opaque value up in a class that releases it on close
or __exit__
or __del__
or whatever seems appropriate. One nice way to do this is with @contextmanager
:
@contextlib.contextmanager
def freeing(value):
try:
yield value
finally:
libc.free(value)
所以:
newbuf = libc.strdup(mybuf)
with freeing(newbuf):
do_stuff(newbuf)
do_more_stuff(newbuf)
# automatically freed before you get here
# (or even if you don't, because of an exception/return/etc.)
或者:
@contextlib.contextmanager
def strduping(buf):
value = libc.strdup(buf)
try:
yield value
finally:
libc.free(value)
现在:
with strduping(mybuf) as newbuf:
do_stuff(newbuf)
do_more_stuff(newbuf)
# again, automatically freed here
这篇关于使用外部 C DLL 时 Python 中的内存泄漏的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!