Linux上的Python内存消耗:物理和虚拟内存在增长,而堆大小保持不变 [英] Python memory consumption on Linux: physical and virtual memory are growing while the heap size remains the same

查看:807
本文介绍了Linux上的Python内存消耗:物理和虚拟内存在增长,而堆大小保持不变的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从事某种用Python编写的系统服务(实际上,它只是一个日志解析器).该程序应长时间连续工作(希望我的意思是几天零几周没有故障,无需重新启动).这就是为什么我担心内存消耗.

我将来自不同站点的有关进程内存使用情况的不同信息汇总到一个简单的函数中:

#!/usr/bin/env python
from pprint import pprint
from guppy import hpy
from datetime import datetime
import sys
import os
import resource
import re

def debug_memory_leak():
    #Getting virtual memory size 
    pid = os.getpid()
    with open(os.path.join("/proc", str(pid), "status")) as f:
        lines = f.readlines()
    _vmsize = [l for l in lines if l.startswith("VmSize")][0]
    vmsize = int(_vmsize.split()[1])

    #Getting physical memory size  
    pmsize = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss

    #Analyzing the dynamical memory segment - total number of objects in memory and heap size
    h = hpy().heap()
    if __debug__:
        print str(h)
    m = re.match(
        "Partition of a set of ([0-9]+) objects. Total size = ([0-9]+) bytes(.*)", str(h))
    objects = m.group(1)
    heap = int(m.group(2))/1024 #to Kb

    current_time = datetime.now().strftime("%H:%M:%S")
    data = (current_time, objects, heap, pmsize, vmsize)
    print("\t".join([str(d) for d in data]))

此功能已用于研究长时间播放过程中内存消耗的动态,但我仍然无法解释其行为.您可以看到,在这二十分钟中,物理和虚拟内存分别增加了11%和1%,而对象的堆大小和对象总数没有改变.

UPD :到目前为止,该过程已经运行了将近15个小时.堆仍然是相同的,但是物理内存增加了六倍,虚拟内存增加了50%.曲线似乎是线性的,除了凌晨3:00的异常值:

Time Obj Heap PhM VM

19:04:19 31424 3928 5460 143732

19:04:29 30582 3704 10276 158240

19:04:39 30582 3704 10372 157772

19:04:50 30582 3709 10372 157772

19:05:00 30582 3704 10372 157772

(...)

19:25:00 30583 3704 11524 159900

09:53:23 30581 3704 62380 210756

我想知道进程的地址空间是怎么回事.恒定的堆大小表明所有动态对象都已正确释放.但是我毫不怀疑,从长远来看,内存消耗的增长将影响这个至关重要的过程的可持续性.

任何人都可以澄清这个问题吗?谢谢.

(我使用RHEL 6.4,内核2.6.32-358和Python 2.6.6)

解决方案

不知道您的程序在做什么,这可能会有所帮助.

前一段时间,我在做一个项目时遇到了这篇文章: http://chase-seibert.github .io/blog/2013/08/03/diagnosing-memory-leaks-python.html 其中说:长时间运行的Python作业在运行时会消耗大量内存,即使所有垃圾都已正确收集,也可能直到进程真正终止后才将内存返还给操作系统."

我最终使用了多处理模块,使我的项目分叉成一个单独的进程,并在需要工作时返回它,此后我再也没有发现任何内存问题.

您可以在Python 3.3中尝试使用它 http://bugs.python.org/issue11849

I'm working on the some kind of a system service (actually it's just a log parser) written in Python. This program should work continuously for a long time (hopefully I mean days and weeks without failures and needs of restart). That's why I am concerned about memory consumption.

I put together different information about process memory usage from different sites into one simple function:

#!/usr/bin/env python
from pprint import pprint
from guppy import hpy
from datetime import datetime
import sys
import os
import resource
import re

def debug_memory_leak():
    #Getting virtual memory size 
    pid = os.getpid()
    with open(os.path.join("/proc", str(pid), "status")) as f:
        lines = f.readlines()
    _vmsize = [l for l in lines if l.startswith("VmSize")][0]
    vmsize = int(_vmsize.split()[1])

    #Getting physical memory size  
    pmsize = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss

    #Analyzing the dynamical memory segment - total number of objects in memory and heap size
    h = hpy().heap()
    if __debug__:
        print str(h)
    m = re.match(
        "Partition of a set of ([0-9]+) objects. Total size = ([0-9]+) bytes(.*)", str(h))
    objects = m.group(1)
    heap = int(m.group(2))/1024 #to Kb

    current_time = datetime.now().strftime("%H:%M:%S")
    data = (current_time, objects, heap, pmsize, vmsize)
    print("\t".join([str(d) for d in data]))

This function has been used to study the dynamics of the memory consumption of my long-playing process, and I still cannot explain its behavior. You can see that the heap size and total amount of the objects did not changed while the physical and virtual memory increased by 11% and 1% during these twenty minutes.

UPD: The process has been working for almost 15 hours by this moment. The heap is still the same, but the physical memory increased sixfold and the virtual memory increased by 50%. The curve is seem to be linear excepting the strange outliers at 3:00 AM:

Time Obj Heap PhM VM

19:04:19 31424 3928 5460 143732

19:04:29 30582 3704 10276 158240

19:04:39 30582 3704 10372 157772

19:04:50 30582 3709 10372 157772

19:05:00 30582 3704 10372 157772

(...)

19:25:00 30583 3704 11524 159900

09:53:23 30581 3704 62380 210756

I wonder what is going on with the address space of my process. The constant size of heap suggests that all of the dynamical objects are deallocated correctly. But I have no doubt that growing memory consumption will affect the sustainability of this life-critical process in the long run.

Could anyone clarify this issue please? Thank you.

(I use RHEL 6.4, kernel 2.6.32-358 with Python 2.6.6)

解决方案

Without knowing what your program is doing, this might help.

I came across this article when working on a project a while back: http://chase-seibert.github.io/blog/2013/08/03/diagnosing-memory-leaks-python.html Which says, "Long running Python jobs that consume a lot of memory while running may not return that memory to the operating system until the process actually terminates, even if everything is garbage collected properly."

I ended up using the multiprocessing module to have my project fork a separate process and return when it needed to do work, and I haven't noticed any memory issues since.

That or try it in Python 3.3 http://bugs.python.org/issue11849

这篇关于Linux上的Python内存消耗:物理和虚拟内存在增长,而堆大小保持不变的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆