通过 tracemalloc 模块在 python 中查找内存泄漏 [英] Finding memory leak in python by tracemalloc module

查看:69
本文介绍了通过 tracemalloc 模块在 python 中查找内存泄漏的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个使用开源 pytorch 模型的 python 脚本,这段代码存在内存泄漏.我正在使用 memory_profiler mprof run --include-children python my_sctipt.py 运行它并获得以下图像:

I have a python script which uses an opensource pytorch model and this code has a memory leak. I am running this with memory_profiler mprof run --include-children python my_sctipt.py and get the following image:

我正在尝试通过系统python模块tracemalloc查找泄漏原因:

I am trying to search for the reason of the leak by the system python module tracemalloc:

tracemalloc.start(25)
while True:
    ...
    snap = tracemalloc.take_snapshot()
    domain_filter = tracemalloc.DomainFilter(True, 0)
    snap = snap.filter_traces([domain_filter])
    stats = snap.statistics('lineno', True)
    for stat in stats[:10]:
        print(stat)

如果只查看 tracemalloc 输出,我将无法识别问题.我假设问题出在 C 扩展中,但是,我想确保它是正确的.我尝试通过 DomainFilter 更改域,但我只在 0 个域中输出.

If looking only at tracemalloc output, I will not be able to identify the problem. I assume that the problem is in the C extension but, I would like to make sure it is true. I tried to change the domain by DomainFilter, but I have output only in 0 domain.

另外,我不明白 tracemalloc.start(frameno) 得到的参数的含义,frameno 是最近的帧数,但是当我改变它时什么也没有发生.

Also, I don't understand the meaning of the parameter which tracemalloc.start(frameno) has got, frameno is a number of the most recent frames, but nothing happens when I change it.

接下来我该怎么做才能找到代码中导致内存泄漏的问题?

What can I do next to find the problematic place in the code which causes the memory leak?

期待您的答复.

推荐答案

鉴于您猜测问题出在 C 扩展中,但您想确保这是真的,我建议您这样做一个不太特定于 python 的工具,如 https://github.com/vmware/chap 或在至少,如果您能够在 Linux 上运行您的程序.

Given that your guess is that the problem is in the C extension, but that you want to make sure this is true, I would suggest that you do so using a tool that is less python-specific like https://github.com/vmware/chap or at least if you are able to run your program on Linux.

您需要做的是运行您的脚本(未检测)并在某个时候收集一个实时核心(例如,使用gcore pid-of-your-running-program").

What you will need to do is run your script (uninstrumented) and at some point gather a live core (for example, using "gcore pid-of-your-running-program").

拥有该核心后,在 chap 中打开该核心(chap your-core-file-path")并在 chap 提示符下尝试以下命令:

Once you have that core, open that core in chap ("chap your-core-file-path") and try the following command from the chap prompt:

总结可写

输出将是这样的,但您的数字可能会有很大差异:

The output will be something like this, but your numbers will likely vary considerably:

chap> summarize writable
5 ranges take 0x2021000 bytes for use: stack
6 ranges take 0x180000 bytes for use: python arena
1 ranges take 0xe1000 bytes for use: libc malloc main arena pages
4 ranges take 0x84000 bytes for use: libc malloc heap
8 ranges take 0x80000 bytes for use: used by module
1 ranges take 0x31000 bytes for use: libc malloc mmapped allocation
4 ranges take 0x30000 bytes for use: unknown
29 writable ranges use 0x23e7000 (37,646,336) bytes.

摘要中的行按字节使用量的递减顺序给出,因此您可以遵循该顺序.所以首先看最上面的我们看到使用的是堆栈":

The lines in the summary are given in decreasing order of byte usage, so you can follow that order. So looking at the top one first we see that the use is "stack":

5 ranges take 0x2021000 bytes for use: stack

这个特殊的核心是用于一个非常简单的 python 程序,它启动 4 个额外的线程并使所有 5 个线程都休眠.使用多线程 python 程序很容易发生大量堆栈分配的原因是 python 使用 pthreads 创建额外的线程,而 pthreads 使用堆栈大小的 ulimit 值作为默认值.如果您的程序具有类似的大值,您可以通过多种方式之一更改堆栈大小,包括在父进程中运行ulimit -s"以更改默认堆栈大小.要查看哪些值实际上有意义,您可以在 chap 提示符下使用以下命令:

This particular core was for a very simple python program that starts 4 extra threads and has all 5 threads sleep. The reason large stack allocations can happen rather easily with a multi-threaded python program is that python uses pthreads to create additional threads and pthreads uses the ulimit value for stack size as a default. If your program has a similarly large value, you can change the stack size in one of several ways, including running "ulimit -s" in the parent process to change the default stack size. To see what values actually make sense you can use the following command from the chap prompt:

chap> describe stacks
Thread 1 uses stack block [0x7fffe22bc000, 7fffe22dd000)
 current sp: 0x7fffe22daa00
Peak stack usage was 0x7798 bytes out of 0x21000 total.

Thread 2 uses stack block [0x7f51ec07c000, 7f51ec87c000)
 current sp: 0x7f51ec87a750
Peak stack usage was 0x2178 bytes out of 0x800000 total.

Thread 3 uses stack block [0x7f51e7800000, 7f51e8000000)
 current sp: 0x7f51e7ffe750
Peak stack usage was 0x2178 bytes out of 0x800000 total.

Thread 4 uses stack block [0x7f51e6fff000, 7f51e77ff000)
 current sp: 0x7f51e77fd750
Peak stack usage was 0x2178 bytes out of 0x800000 total.

Thread 5 uses stack block [0x7f51e67fe000, 7f51e6ffe000)
 current sp: 0x7f51e6ffc750
Peak stack usage was 0x2178 bytes out of 0x800000 total.

5 stacks use 0x2021000 (33,689,600) bytes.

因此,您在上面看到的是,其中 4 个堆栈的大小为 8MiB,但很容易远低于 64KiB.

So what you see above is that 4 of the stacks are 8MiB in size but could easily be well under 64KiB.

您的程序可能没有任何堆栈大小问题,但如果有,您可以按照上述方法修复它们.

Your program may not have any issues with stack size, but if so, you can fix them as described above.

继续检查增长的原因,查看摘要中的下一行:

Continuing with checking for causes of growth, look at the next line from the summary:

6 ranges take 0x180000 bytes for use: python arena

所以python arenas使用第二大内存.这些严格用于特定于 python 的分配.因此,如果此值在您的情况下很大,它会反驳您关于 C 分配是罪魁祸首的理论,但是您稍后可以做更多的事情来弄清楚这些 python 分配是如何使用的.

So python arenas use the next most memory. These are used strictly for python-specific allocations. So if this value is large in your case it disproves your theory about C allocations being the culprit, but there is more you can do later to figure out how those python allocations are being used.

查看摘要的其余几行,我们看到一些将libc"作为使用"描述的一部分:

Looking at the remaining lines of the summary, we see a few with "libc" as part of the "use" description:

1 ranges take 0xe1000 bytes for use: libc malloc main arena pages
4 ranges take 0x84000 bytes for use: libc malloc heap
1 ranges take 0x31000 bytes for use: libc malloc mmapped allocation

请注意,libc 负责所有内存,但您无法知道该内存用于非 Python 代码,因为对于超过特定大小阈值(远低于 4K)的分配,python 通过 malloc 而不是抓取来自 Python 竞技场之一的内存.

Note that libc is responsible for all that memory, but you can't know that the memory is used for non-python code because for allocations beyond a certain size threshold (well under 4K) python grabs memory via malloc rather than grabbing memory from one of the python arenas.

因此,假设您已经解决了堆栈使用方面可能遇到的任何问题,并且您主要拥有与python arenas"或libc malloc"相关的用法.接下来您要了解的是,该内存是主要使用"(意味着已分配但从未释放)还是空闲"(意味着已释放但未返还给操作系统).您可以按如下所示进行操作:

So lets assume that you have resolved any issues you might have had with stack usage and you have mainly "python arenas" or "libc malloc" related usages. The next thing you want to understand is whether that memory is mostly "used" (meaning allocated but never freed) or "free" (meaning "freed but not given back to the operating system). You can do that as shown:

chap> count used
15731 allocations use 0x239388 (2,331,528) bytes.
chap> count free
1563 allocations use 0xb84c8 (754,888) bytes.

所以在上面的例子中,使用的分配占主导地位,我们应该做的是尝试理解那些使用的分配.自由分配占主导地位的情况要复杂得多,在用户指南中进行了一些讨论,但在这里需要花费太多时间.

So in the above case, used allocations dominate and what one should do is to try to understand those used allocations. The case where free allocations dominate is much more complex and is discussed a bit in the user guide but would take too much time to cover here.

所以现在让我们假设使用的分配是您案例增长的主要原因.我们可以找出为什么我们有这么多使用的分配.

So lets assume for now that used allocations are the main cause of growth in your case. We can find out why we have so many used allocations.

我们可能想知道的第一件事是是否有任何分配实际上泄漏",即它们不再可访问.这不包括增长是由于基于容器的增长的情况.

The first thing we might want to know is whether any allocations were actually "leaked" in the sense that they are no longer reachable. This excludes the case where the growth is due to container-based growth.

这样做的方法如下:

chap> summarize leaked
0 allocations use 0x0 (0) bytes.

因此,对于这个特定的内核,对于 Python 内核来说很常见,没有任何泄漏.您的号码可能不为零.如果它不为零,但仍远低于上面报告的与用于python"或libc"的内存相关的总数,您可能只记下泄漏,但继续寻找增长的真正原因.用户指南有一些关于调查泄漏的信息,但它有点稀疏.如果泄漏计数实际上足以解释您的增长问题,那么您接下来应该进行调查,如果不是,请继续阅读.

So for this particular core, as is pretty common for python cores, nothing was leaked. Your number may be non-zero. If it is non-zero but still much lower than the totals associated with memory used for "python" or "libc" reported above, you might just make a note of the leaks but continue to look for the real cause of growth. The user guide has some information about investigating leaks but it is a bit sparse. If the leak count is actually big enough to explain your growth issue, you should investigate that next but if not, read on.

既然您假设基于容器的增长,以下命令很有用:

Now that you are assuming container-based growth the following commands are useful:

chap> redirect on
chap> summarize used
Wrote results to scratch/core.python_5_threads.summarize_used
chap> summarize used /sortby bytes
Wrote results to scratch/core.python_5_threads.summarize_used::sortby:bytes

上面创建了两个文本文件,一个是根据对象计数排序的摘要,另一个是根据这些对象直接使用的总字节数进行摘要.

The above created two text files, one which has a summary ordered in terms of object counts and another which has a summary in terms of the total bytes used directly by those objects.

目前 chap 对 python 的支持非常有限(它找到了那些 python 对象,除了 libc malloc 分配的任何对象,但对于 python 对象,摘要仅根据模式对 python 对象进行了有限的分类(例如,%SimplePythonObject 匹配诸如int"、str"等不包含其他 python 对象的东西,而 %ContainerPythonObject 匹配诸如元组、列表、dict 之类的东西,它们确实包含对其他 python 对象的引用).也就是说,从总结中应该很容易看出使用分配的增长主要是由于 Python 分配的对象还是本机代码分配的对象.

At present chap has only very limited support for python (it finds those python objects, in addition to any allocated by libc malloc but for python objects the summary only breaks out limited categories for python objects in terms of patterns (for example, %SimplePythonObject matches things like "int", "str", ... that don't hold other python objects and %ContainerPythonObject matches things like tuple, list, dict, ... that do hold references to other python objects). With that said, it should be pretty easy to tell from the summary whether the growth in used allocations is primarily due to objects allocated by python or objects allocated by native code.

因此,在这种情况下,考虑到您特别想查明增长是否由本机代码引起,请查看摘要中的计数如下,所有这些都与 Python 相关:

So in this case, given that you specifically are trying to find out whether the growth is due to native code or not, look in the summary for counts like the following, all of which are python-related:

Pattern %SimplePythonObject has 7798 instances taking 0x9e9e8(649,704) bytes.

Pattern %ContainerPythonObject has 7244 instances taking 0xc51a8(807,336) bytes.

Pattern %PyDictKeysObject has 213 instances taking 0xb6730(747,312) bytes.

所以在我一直使用的核心中,绝对是python分配占主导地位.

So in the core I have been using for an example, definitely python allocations dominate.

您还将看到以下一行,用于 chap 尚未识别的分配.您无法对这些是否与 Python 相关做出假设.

You will also see a line for the following, which is for allocations that chap does not yet recognize. You can't make assumptions about whether these are python-related or not.

Unrecognized allocations have 474 instances taking 0x1e9b8(125,368) bytes.

这有望回答您下一步可以做什么的基本问题.至少在那个时候你会明白增长是由 C 代码还是 python 代码引起的,并且根据你发现的内容,章节用户指南应该可以帮助你从那里走得更远.

This will hopefully answer your basic question of what you can do next. At least at that point you will understand whether the growth is likely due to C code or python code and depending on what you find, the chap user guide should help you go further from there.

这篇关于通过 tracemalloc 模块在 python 中查找内存泄漏的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆