使用 pandas 数据帧的内存泄漏 [英] Memory leak using pandas dataframe

查看:61
本文介绍了使用 pandas 数据帧的内存泄漏的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在多线程代码中使用 pandas.DataFrame(实际上是 DataFrame 的自定义子类,称为 Sound).我注意到我有内存泄漏,因为我的程序的内存使用量逐渐增加超过 1000 万,最终达到计算机内存的 ~100% 并崩溃.

我使用

这实际上似乎根本不是内存泄漏,但可能与操作系统内存分配问题有关.请查看 github 问题以获取更多信息

I am using pandas.DataFrame in a multi-threaded code (actually a custom subclass of DataFrame called Sound). I have noticed that I have a memory leak, since the memory usage of my program augments gradually over 10mn, to finally reach ~100% of my computer memory and crash.

I used objgraph to try tracking this leak, and found out that the count of instances of MyDataFrame is going up all the time while it shouldn't : every thread in its run method creates an instance, makes some calculations, saves the result in a file and exits ... so no references should be kept.

Using objgraph I found that all the data frames in memory have a similar reference graph :

I have no idea if that's normal or not ... it looks like this is what is keeping my objects in memory. Any idea, advice, insight ?

解决方案

Confirmed that there's some kind of memory leak going on in the indexing infrastructure. It's not caused by the above reference graph. Let's move the discussion to GitHub (SO is for Q&A):

https://github.com/pydata/pandas/issues/2659

EDIT: this actually appears to not be a memory leak at all, but has to do with the OS memory allocation issues perhaps. Please have a look at the github issue for more information

这篇关于使用 pandas 数据帧的内存泄漏的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆