使用Pandas DataFrame的内存泄漏 [英] Memory leak using pandas dataframe

查看:663
本文介绍了使用Pandas DataFrame的内存泄漏的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在多线程代码(实际上是DataFrame的自定义子类,称为Sound)中使用了pandas.DataFrame.我注意到我内存泄漏,因为程序的内存使用量逐渐增加了1000万以上,最终达到了我的计算机内存的约100%并崩溃.

我使用 objgraph 尝试跟踪此泄漏,并发现MyDataFrame一直在上升,而它不应该上升:run方法中的每个线程都会创建一个实例,进行一些计算,将结果保存在文件中并退出...因此不应保留任何引用. /p>

使用objgraph我发现内存中的所有数据帧都有相似的参考图:

我不知道那是否正常……看来这就是将我的对象保留在内存中的原因.有任何想法,建议和见解吗?

解决方案

确认索引基础结构中发生某种内存泄漏.这不是上述参考图引起的.让我们将讨论移至GitHub(SO用于Q& A):

https://github.com/pydata/pandas/issues/2659

这实际上似乎根本不是内存泄漏,但可能与OS内存分配问题有关.请查看github问题以获取更多信息

I am using pandas.DataFrame in a multi-threaded code (actually a custom subclass of DataFrame called Sound). I have noticed that I have a memory leak, since the memory usage of my program augments gradually over 10mn, to finally reach ~100% of my computer memory and crash.

I used objgraph to try tracking this leak, and found out that the count of instances of MyDataFrame is going up all the time while it shouldn't : every thread in its run method creates an instance, makes some calculations, saves the result in a file and exits ... so no references should be kept.

Using objgraph I found that all the data frames in memory have a similar reference graph :

I have no idea if that's normal or not ... it looks like this is what is keeping my objects in memory. Any idea, advice, insight ?

解决方案

Confirmed that there's some kind of memory leak going on in the indexing infrastructure. It's not caused by the above reference graph. Let's move the discussion to GitHub (SO is for Q&A):

https://github.com/pydata/pandas/issues/2659

EDIT: this actually appears to not be a memory leak at all, but has to do with the OS memory allocation issues perhaps. Please have a look at the github issue for more information

这篇关于使用Pandas DataFrame的内存泄漏的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆