内存使用在大型数据结构的操纵/处理中 [英] memory use in large data-structures manipulation/processing

查看:124
本文介绍了内存使用在大型数据结构的操纵/处理中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些我经常处理的大(〜100 Mb)的文件。当我在处理过程中尝试删除不需要的数据结构时,内存消耗有点过高。我想知道是否有办法有效地操纵大数据,例如:

  def read(self,filename):
fc = read_100_mb_file(filename)
self.process(fc)
def process(self,content):
#做一些处理文件内容

是否有重复的数据结构?使用类自动属性(如self.fc)的记忆效率是否更高?



什么时候应该使用垃圾回收?我知道gc模块,但是在我 del fc 之后我会调用它?



更新

ps 100Mb本身不是问题。但是浮动转换,进一步的处理对于工作集和虚拟大小来说都要大大增加(我在Windows上)。

解决方案

建议您查看使用Python中的生成器的 David Beazley的演示。这种技术可以让您处理大量的数据,并且可以快速地进行复杂的处理,而不会破坏您的内存使用。海事组织,诀窍不是尽可能有效地持有大量的内存数据;诀窍是避免在同一时间将大量数据加载到内存中。


I have a number of large (~100 Mb) files which I'm regularly processing. While I'm trying to delete unneeded data structures during processing, memory consumption is a bit too high. I was wondering if there is a way to efficiently manipulate large data, e.g.:

def read(self, filename):
    fc = read_100_mb_file(filename)
    self.process(fc)
def process(self, content):
    # do some processing of file content

Is there a duplication of data structures? Isn't it more memory efficient to use a class-wide attribute like self.fc?

When should I use garbage collection? I know about the gc module, but do I call it after I del fc for example?

update
p.s. 100 Mb is not a problem in itself. but float conversion, further processing add significantly more to both working set and virtual size (I'm on Windows).

解决方案

I'd suggest looking at the presentation by David Beazley on using generators in Python. This technique allows you to handle a lot of data, and do complex processing, quickly and without blowing up your memory use. IMO, the trick isn't holding a huge amount of data in memory as efficiently as possible; the trick is avoiding loading a huge amount of data into memory at the same time.

这篇关于内存使用在大型数据结构的操纵/处理中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆