当我在for循环中加载大量pickle文件时,如何防止内存泄漏? [英] How do I prevent memory leak when I load large pickle files in a for loop?
问题描述
我有50个泡菜文件,每个文件0.5 GB。
每个pickle文件都包含一个自定义类对象列表。
我没有问题,使用
分别加载以下函数:
pre $ def $ load $($) :
开放(fp,'rb')为fh:
listOfObj = pickle.load(fh)
return listOfObj
然而,当我尝试迭代地加载文件
时,我得到了内存泄漏。
l = ['filepath1','filepath2','filepath3','filepath4']
for fp in l:
x = loadPickle(fp)
print ('loaded {0}'。format(fp))
c>打印加载的filepath2 。
如何编写代码保证在每次迭代期间只加载一个pickle?
有关SO的相关问题的答案建议使用 weakref
模块或使用 gc
模块的显式垃圾回收,但我很难理解如何应用这些方法对我的特殊用例。这是因为我没有充分理解如何引用引擎盖。
相关: Python垃圾回收
您可以通过添加 x = None
紧接在之后为fp in l:
。
工作是因为它将dereferenciate变量 x
,允许python垃圾收集器在调用 loadPickle()
第二次。
I have 50 pickle files that are 0.5 GB each. Each pickle file is comprised of a list of custom class objects. I have no trouble loading the files individually using the following function:
def loadPickle(fp):
with open(fp, 'rb') as fh:
listOfObj = pickle.load(fh)
return listOfObj
However, when I try to iteratively load the files I get a memory leak.
l = ['filepath1', 'filepath2', 'filepath3', 'filepath4']
for fp in l:
x = loadPickle(fp)
print( 'loaded {0}'.format(fp) )
My memory overflows before loaded filepath2
is printed.
How can I write code that guarantees that only a single pickle is loaded during each iteration?
Answers to related questions on SO suggest using objects defined in the weakref
module or explicit garbage collection using the gc
module, but I am having a difficult time understanding how I would apply these methods to my particular use case. This is because I have an insufficient understanding of how referencing works under the hood.
Related: Python garbage collection
You can fix that by adding x = None
right after for fp in l:
.
The reason this works is because it will dereferenciate variable x
, hance allowing the python garbage collector to free some virtual memory before calling loadPickle()
the second time.
这篇关于当我在for循环中加载大量pickle文件时,如何防止内存泄漏?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!