内存使用在大型数据结构的操纵/处理中 [英] memory use in large data-structures manipulation/processing
问题描述
def read(self,filename):
fc = read_100_mb_file(filename)
self.process(fc)
def process(self,content):
#做一些处理文件内容
是否有重复的数据结构?使用类自动属性(如self.fc)的记忆效率是否更高?
什么时候应该使用垃圾回收?我知道gc模块,但是在我 del fc
之后我会调用它?
更新
ps 100Mb本身不是问题。但是浮动转换,进一步的处理对于工作集和虚拟大小来说都要大大增加(我在Windows上)。
建议您查看使用Python中的生成器的 David Beazley的演示。这种技术可以让您处理大量的数据,并且可以快速地进行复杂的处理,而不会破坏您的内存使用。海事组织,诀窍不是尽可能有效地持有大量的内存数据;诀窍是避免在同一时间将大量数据加载到内存中。
I have a number of large (~100 Mb) files which I'm regularly processing. While I'm trying to delete unneeded data structures during processing, memory consumption is a bit too high. I was wondering if there is a way to efficiently manipulate large data, e.g.:
def read(self, filename):
fc = read_100_mb_file(filename)
self.process(fc)
def process(self, content):
# do some processing of file content
Is there a duplication of data structures? Isn't it more memory efficient to use a class-wide attribute like self.fc?
When should I use garbage collection? I know about the gc module, but do I call it after I del fc
for example?
update
p.s. 100 Mb is not a problem in itself. but float conversion, further processing add significantly more to both working set and virtual size (I'm on Windows).
I'd suggest looking at the presentation by David Beazley on using generators in Python. This technique allows you to handle a lot of data, and do complex processing, quickly and without blowing up your memory use. IMO, the trick isn't holding a huge amount of data in memory as efficiently as possible; the trick is avoiding loading a huge amount of data into memory at the same time.
这篇关于内存使用在大型数据结构的操纵/处理中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!