添加列表值时发生内存泄漏 [英] Memory leak in adding list values

查看:85
本文介绍了添加列表值时发生内存泄漏的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是python的新手,有很大的内存问题.我的脚本每天运行24/7,每天分配的内存大约多1gb.我可以将其范围缩小到此功能:

代码:

#!/usr/bin/env python
# coding: utf8
import gc
from pympler import muppy
from pympler import summary
from pympler import tracker


v_list = [{ 
     'url_base' : 'http://www.immoscout24.de',
     'url_before_page' : '/Suche/S-T/P-',
     'url_after_page' : '/Wohnung-Kauf/Hamburg/Hamburg/-/-/50,00-/EURO--500000,00?pagerReporting=true',}]

# returns url
def get_url(v, page_num):
    return v['url_base'] + v['url_before_page'] + str(page_num) + v['url_after_page']


while True:
    gc.enable()

    for v_idx,v in enumerate(v_list):

        # mem test ouput
        all_objects = muppy.get_objects()
        sum1 = summary.summarize(all_objects)
        summary.print_(sum1)


        # magic happens here
        url = get_url(v, 1)


        # mem test ouput
        all_objects = muppy.get_objects()
        sum1 = summary.summarize(all_objects)
        summary.print_(sum1)

        # collects unlinked objects
        gc.collect()

输出:

======================== | =========== | ============
                    list |       26154 |     10.90 MB
                     str |       31202 |      1.90 MB
                    dict |         507 |    785.88 KB

尤其是list属性在600kb左右的每个循环中都越来越大,我不知道为什么.我认为我不在这里存储任何内容,并且每次都应覆盖url变量.所以基本上所有内存都应该消耗掉.

我在这里想念什么? :-)

解决方案

此内存泄漏"是您对内存泄漏的测试造成的100%. all_objects列表最终维护了几乎所有您创建的对象的列表-甚至不再需要那些对象,如果它们不在all_objects中,它们将被清除,但是它们是存在的. >

作为快速测试:

  • 如果我按原样运行此代码,我得到的list值将以大约600KB/周期的速度增长,就像您在问题中所说的那样,至少有20MB被我杀死了. p>

  • 但是,如果我在sum1 =行之后添加del all_objects,则会得到list值在100KB和650KB之间来回跳动.

回想一下为什么会发生这种情况,回想起来很明显.在您调用muppy.get_objects()时(第一次除外),all_objects的先前值仍然有效.因此,它是被返回的对象之一.这意味着,即使将返回值分配给all_objects,也不会释放旧值,只是将其refcount从2降低到1.这不仅使旧值本身,而且还使每个元素保持活动状态.在其中-根据定义,这是上次循环中所有活动的内容.

如果您可以找到一个为您提供weakref而不是普通引用的内存探索库,则可能会有所帮助.否则,请确保在再次调用muppy.get_objects之前执行del all_objects. (紧随其后的是sum1 =行,这似乎是最明显的位置.)

i'm new to python and have big memory issue. my script runs 24/7 and each day it allocates about 1gb more of my memory. i could narrow it down to this function:

Code:

#!/usr/bin/env python
# coding: utf8
import gc
from pympler import muppy
from pympler import summary
from pympler import tracker


v_list = [{ 
     'url_base' : 'http://www.immoscout24.de',
     'url_before_page' : '/Suche/S-T/P-',
     'url_after_page' : '/Wohnung-Kauf/Hamburg/Hamburg/-/-/50,00-/EURO--500000,00?pagerReporting=true',}]

# returns url
def get_url(v, page_num):
    return v['url_base'] + v['url_before_page'] + str(page_num) + v['url_after_page']


while True:
    gc.enable()

    for v_idx,v in enumerate(v_list):

        # mem test ouput
        all_objects = muppy.get_objects()
        sum1 = summary.summarize(all_objects)
        summary.print_(sum1)


        # magic happens here
        url = get_url(v, 1)


        # mem test ouput
        all_objects = muppy.get_objects()
        sum1 = summary.summarize(all_objects)
        summary.print_(sum1)

        # collects unlinked objects
        gc.collect()

Output:

======================== | =========== | ============
                    list |       26154 |     10.90 MB
                     str |       31202 |      1.90 MB
                    dict |         507 |    785.88 KB

expecially the list attribute is getting bigger and bigger each cycle around 600kb and i don't have an idea why. in my opinion i do not store anything here and the url variable should be overwritten each time. so basically there should be any memory consumption at all.

what am i missing here? :-)

解决方案

This "memory leak" is 100% caused by your testing for memory leaks. The all_objects list ends up maintaining a list of almost every object you ever created—even the ones you don't need anymore, which would have been cleaned up if they weren't in all_objects, but they are.

As a quick test:

  • If I run this code as-is, I get the list value growing by about 600KB/cycle, just as you say in your question, at least up to 20MB, where I killed it.

  • If I add del all_objects right after the sum1 = line, however, I get the list value bouncing back and forth between 100KB and 650KB.

If you think about why this is happening, it's pretty obvious in retrospect. At the point when you call muppy.get_objects() (except the first time), the previous value of all_objects is still alive. So, it's one of the objects that gets returned. That means that, even when you assign the return value to all_objects, you're not freeing the old value, you're just dropping its refcount from 2 to 1. Which keeps alive not just the old value itself, but every element within it—which, by definition, is everything that was alive last time through the loop.

If you can find a memory-exploring library that gives you weakrefs instead of normal references, that might help. Otherwise, make sure to do a del all_objects at some point before calling muppy.get_objects again. (Right after the only place you use it, the sum1 = line, seems like the most obvious place.)

这篇关于添加列表值时发生内存泄漏的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆