用 pandas 创建缓冲区时发生内存泄漏? [英] memory leak in creating a buffer with pandas?

查看:89
本文介绍了用 pandas 创建缓冲区时发生内存泄漏?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用熊猫做环形缓冲区,但是内存使用量却在不断增长.我在做什么错了?

I'm using pandas to do a ring buffer, but the memory use keeps growing. what am I doing wrong?

下面是代码(从问题的第一篇文章中编辑了一点):

Here is the code (edited a little from the first post of the question):

import pandas as pd
import numpy as np
import resource


tempdata = np.zeros((10000,3))
tdf = pd.DataFrame(data=tempdata, columns = ['a', 'b', 'c'])

i = 0
while True:
    i += 1
    littledf = pd.DataFrame(np.random.rand(1000, 3), columns = ['a', 'b', 'c'])
    tdf = pd.concat([tdf[1000:], littledf], ignore_index = True)
    del littledf
    currentmemory = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
    if i% 1000 == 0:
        print 'total memory:%d kb' % (int(currentmemory)/1000)

这就是我得到的:

total memory:37945 kb
total memory:38137 kb
total memory:38137 kb
total memory:38768 kb
total memory:38768 kb
total memory:38776 kb
total memory:38834 kb
total memory:38838 kb
total memory:38838 kb
total memory:38850 kb
total memory:38854 kb
total memory:38871 kb
total memory:38871 kb
total memory:38973 kb
total memory:38977 kb
total memory:38989 kb
total memory:38989 kb
total memory:38989 kb
total memory:39399 kb
total memory:39497 kb
total memory:39587 kb
total memory:39587 kb
total memory:39591 kb
total memory:39604 kb
total memory:39604 kb
total memory:39608 kb
total memory:39608 kb
total memory:39608 kb
total memory:39608 kb
total memory:39608 kb
total memory:39608 kb
total memory:39612 kb

不确定是否与此相关:

https://github.com/pydata/pandas/issues/2659

在MacBook Air上使用Anaconda Python进行了测试

Tested on MacBook Air with Anaconda Python

推荐答案

为什么不使用 concat 代替现有的 DataFrame ? i % 10将确定您向每个更新写入哪个1000行插槽.

Instead of using concat, why not update the DataFrame in place? i % 10 will determine which 1000 row slot you write to each update.

i = 0
while True:
    i += 1
    tdf.iloc[1000*(i % 10):1000+1000*(i % 10)] = np.random.rand(1000, 3)
    currentmemory = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
    if i% 1000 == 0:
        print 'total memory:%d kb' % (int(currentmemory)/1000)

这篇关于用 pandas 创建缓冲区时发生内存泄漏?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆