添加列表值时发生内存泄漏 [英] Memory leak in adding list values
问题描述
我是python的新手,有很大的内存问题.我的脚本每天运行24/7,每天分配的内存大约多1gb.我可以将其范围缩小到此功能:
代码:
#!/usr/bin/env python
# coding: utf8
import gc
from pympler import muppy
from pympler import summary
from pympler import tracker
v_list = [{
'url_base' : 'http://www.immoscout24.de',
'url_before_page' : '/Suche/S-T/P-',
'url_after_page' : '/Wohnung-Kauf/Hamburg/Hamburg/-/-/50,00-/EURO--500000,00?pagerReporting=true',}]
# returns url
def get_url(v, page_num):
return v['url_base'] + v['url_before_page'] + str(page_num) + v['url_after_page']
while True:
gc.enable()
for v_idx,v in enumerate(v_list):
# mem test ouput
all_objects = muppy.get_objects()
sum1 = summary.summarize(all_objects)
summary.print_(sum1)
# magic happens here
url = get_url(v, 1)
# mem test ouput
all_objects = muppy.get_objects()
sum1 = summary.summarize(all_objects)
summary.print_(sum1)
# collects unlinked objects
gc.collect()
输出:
======================== | =========== | ============
list | 26154 | 10.90 MB
str | 31202 | 1.90 MB
dict | 507 | 785.88 KB
尤其是list属性在600kb左右的每个循环中都越来越大,我不知道为什么.我认为我不在这里存储任何内容,并且每次都应覆盖url变量.所以基本上所有内存都应该消耗掉.
我在这里想念什么? :-)
此内存泄漏"是您对内存泄漏的测试造成的100%. all_objects
列表最终维护了几乎所有您创建的对象的列表-甚至不再需要那些对象,如果它们不在all_objects
中,它们将被清除,但是它们是存在的. >
作为快速测试:
-
如果我按原样运行此代码,我得到的
list
值将以大约600KB/周期的速度增长,就像您在问题中所说的那样,至少有20MB被我杀死了. p> -
但是,如果我在
sum1 =
行之后添加del all_objects
,则会得到list
值在100KB和650KB之间来回跳动.
回想一下为什么会发生这种情况,回想起来很明显.在您调用muppy.get_objects()
时(第一次除外),all_objects
的先前值仍然有效.因此,它是被返回的对象之一.这意味着,即使将返回值分配给all_objects
,也不会释放旧值,只是将其refcount从2降低到1.这不仅使旧值本身,而且还使每个元素保持活动状态.在其中-根据定义,这是上次循环中所有活动的内容.
如果您可以找到一个为您提供weakref而不是普通引用的内存探索库,则可能会有所帮助.否则,请确保在再次调用muppy.get_objects
之前执行del all_objects
. (紧随其后的是sum1 =
行,这似乎是最明显的位置.)
i'm new to python and have big memory issue. my script runs 24/7 and each day it allocates about 1gb more of my memory. i could narrow it down to this function:
Code:
#!/usr/bin/env python
# coding: utf8
import gc
from pympler import muppy
from pympler import summary
from pympler import tracker
v_list = [{
'url_base' : 'http://www.immoscout24.de',
'url_before_page' : '/Suche/S-T/P-',
'url_after_page' : '/Wohnung-Kauf/Hamburg/Hamburg/-/-/50,00-/EURO--500000,00?pagerReporting=true',}]
# returns url
def get_url(v, page_num):
return v['url_base'] + v['url_before_page'] + str(page_num) + v['url_after_page']
while True:
gc.enable()
for v_idx,v in enumerate(v_list):
# mem test ouput
all_objects = muppy.get_objects()
sum1 = summary.summarize(all_objects)
summary.print_(sum1)
# magic happens here
url = get_url(v, 1)
# mem test ouput
all_objects = muppy.get_objects()
sum1 = summary.summarize(all_objects)
summary.print_(sum1)
# collects unlinked objects
gc.collect()
Output:
======================== | =========== | ============
list | 26154 | 10.90 MB
str | 31202 | 1.90 MB
dict | 507 | 785.88 KB
expecially the list attribute is getting bigger and bigger each cycle around 600kb and i don't have an idea why. in my opinion i do not store anything here and the url variable should be overwritten each time. so basically there should be any memory consumption at all.
what am i missing here? :-)
This "memory leak" is 100% caused by your testing for memory leaks. The all_objects
list ends up maintaining a list of almost every object you ever created—even the ones you don't need anymore, which would have been cleaned up if they weren't in all_objects
, but they are.
As a quick test:
If I run this code as-is, I get the
list
value growing by about 600KB/cycle, just as you say in your question, at least up to 20MB, where I killed it.If I add
del all_objects
right after thesum1 =
line, however, I get thelist
value bouncing back and forth between 100KB and 650KB.
If you think about why this is happening, it's pretty obvious in retrospect. At the point when you call muppy.get_objects()
(except the first time), the previous value of all_objects
is still alive. So, it's one of the objects that gets returned. That means that, even when you assign the return value to all_objects
, you're not freeing the old value, you're just dropping its refcount from 2 to 1. Which keeps alive not just the old value itself, but every element within it—which, by definition, is everything that was alive last time through the loop.
If you can find a memory-exploring library that gives you weakrefs instead of normal references, that might help. Otherwise, make sure to do a del all_objects
at some point before calling muppy.get_objects
again. (Right after the only place you use it, the sum1 =
line, seems like the most obvious place.)
这篇关于添加列表值时发生内存泄漏的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!