使用 pandas 数据帧的内存泄漏 [英] Memory leak using pandas dataframe
问题描述
我在多线程代码中使用 pandas.DataFrame
(实际上是 DataFrame
的自定义子类,称为 Sound
).我注意到我有内存泄漏,因为我的程序的内存使用量逐渐增加超过 1000 万,最终达到计算机内存的 ~100% 并崩溃.
我使用
这实际上似乎根本不是内存泄漏,但可能与操作系统内存分配问题有关.请查看 github 问题以获取更多信息
I am using pandas.DataFrame
in a multi-threaded code (actually a custom subclass of DataFrame
called Sound
). I have noticed that I have a memory leak, since the memory usage of my program augments gradually over 10mn, to finally reach ~100% of my computer memory and crash.
I used objgraph to try tracking this leak, and found out that the count of instances of MyDataFrame
is going up all the time while it shouldn't : every thread in its run
method creates an instance, makes some calculations, saves the result in a file and exits ... so no references should be kept.
Using objgraph
I found that all the data frames in memory have a similar reference graph :
I have no idea if that's normal or not ... it looks like this is what is keeping my objects in memory. Any idea, advice, insight ?
Confirmed that there's some kind of memory leak going on in the indexing infrastructure. It's not caused by the above reference graph. Let's move the discussion to GitHub (SO is for Q&A):
https://github.com/pydata/pandas/issues/2659
EDIT: this actually appears to not be a memory leak at all, but has to do with the OS memory allocation issues perhaps. Please have a look at the github issue for more information
这篇关于使用 pandas 数据帧的内存泄漏的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!