使用 Python 多处理的高内存使用率 [英] High Memory Usage Using Python Multiprocessing

查看:35
本文介绍了使用 Python 多处理的高内存使用率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我看过几篇关于使用 Python 多处理模块的内存使用情况的帖子.然而,这些问题似乎并没有回答我在这里遇到的问题.我发布了我的分析,希望有人能帮助我.

I have seen a couple of posts on memory usage using Python Multiprocessing module. However the questions don't seem to answer the problem I have here. I am posting my analysis with the hope that some one can help me.

我使用多处理并行执行任务,我注意到工作进程的内存消耗无限增长.我有一个小型的独立示例,可以复制我注意到的内容.

I am using multiprocessing to perform tasks in parallel and I noticed that the memory consumption by the worker processes grow indefinitely. I have a small standalone example that should replicate what I notice.

import multiprocessing as mp
import time

def calculate(num):
    l = [num*num for num in range(num)]
    s = sum(l)
    del l       # delete lists as an  option
    return s

if __name__ == "__main__":
    pool = mp.Pool(processes=2)
    time.sleep(5)
    print "launching calculation"
    num_tasks = 1000
    tasks =  [pool.apply_async(calculate,(i,)) for i in range(num_tasks)]
    for f in tasks:    
        print f.get(5)
    print "calculation finished"
    time.sleep(10)
    print "closing  pool"
    pool.close()
    print "closed pool"
    print "joining pool"
    pool.join()
    print "joined pool"
    time.sleep(5)

系统

我运行的是 Windows,我使用任务管理器来监控内存使用情况.我正在运行 Python 2.7.6.

System

I am running Windows and I use the task manager to monitor the memory usage. I am running Python 2.7.6.

我总结了下面 2 个工作进程的内存消耗.

I have summarized the memory consumption by the 2 worker processes below.

+---------------+----------------------+----------------------+
|  num_tasks    |  memory with del     | memory without del   |
|               | proc_1   | proc_2    | proc_1   | proc_2    |
+---------------+----------------------+----------------------+
| 1000          | 4884     | 4694      | 4892     | 4952      |
| 5000          | 5588     | 5596      | 6140     | 6268      |
| 10000         | 6528     | 6580      | 6640     | 6644      |
+---------------+----------------------+----------------------+

在上表中,我尝试更改任务数量并观察所有计算结束时和join-ing pool 之前消耗的内存.'del' 和 'without del' 选项分别是我是否取消注释或注释 calculate(num) 函数内的 del l 行.计算前,内存消耗在4400左右.

In the table above, I tried to change the number of tasks and observe the memory consumed at the end of all calculation and before join-ing the pool. The 'del' and 'without del' options are whether I un-comment or comment the del l line inside the calculate(num) function respectively. Before calculation, the memory consumption is around 4400.

  1. 看起来手动清除列表会导致工作进程的内存使用量降低.我认为垃圾收集器会处理这个问题.有没有办法强制垃圾收集?
  2. 令人费解的是,随着任务数量的增加,两种情况下的内存使用量都在不断增长.有没有办法限制内存使用?

我有一个基于此示例的流程,旨在长期运行.我观察到这个工作进程在通宵运行后占用了大量内存(~4GB).做一个 join 来释放内存不是一个选项,我试图找出一种没有 join-ing 的方法.

I have a process that is based on this example, and is meant to run long term. I observe that this worker processes are hogging up lots of memory(~4GB) after an overnight run. Doing a join to release memory is not an option and I am trying to figure out a way without join-ing.

这似乎有点神秘.有没有人遇到过类似的事情?我该如何解决这个问题?

This seems a little mysterious. Has anyone encountered something similar? How can I fix this issue?

推荐答案

我做了很多研究,但无法找到解决问题本身的解决方案.但是有一个不错的解决方法,可以以很小的成本防止内存爆裂,尤其是在服务器端长时间运行的代码上.

I did a lot of research, and couldn't find a solution to fix the problem per se. But there is a decent work around that prevents the memory blowout for a small cost, worth especially on server side long running code.

解决方案本质上是在完成固定数量的任务后重新启动单个工作进程.python 中的Pool 类以maxtasksperchild 作为参数.您可以指定 maxtasksperchild=1000 从而限制在每个子进程上运行 1000 个任务.达到 maxtasksperchild 数量后,池会刷新其子进程.对最大任务使用谨慎的数字,可以平衡消耗的最大内存与与重新启动后端进程相关的启动成本.Pool 构造是这样完成的:

The solution essentially was to restart individual worker processes after a fixed number of tasks. The Pool class in python takes maxtasksperchild as an argument. You can specify maxtasksperchild=1000 thus limiting 1000 tasks to be run on each child process. After reaching the maxtasksperchild number, the pool refreshes its child processes. Using a prudent number for maximum tasks, one can balance the max memory that is consumed, with the start up cost associated with restarting back-end process. The Pool construction is done as :

pool = mp.Pool(processes=2,maxtasksperchild=1000)

我将我的完整解决方案放在这里,以便对其他人有用!

I am putting my full solution here so it can be of use to others!

import multiprocessing as mp
import time

def calculate(num):
    l = [num*num for num in range(num)]
    s = sum(l)
    del l       # delete lists as an  option
    return s

if __name__ == "__main__":

    # fix is in the following line #
    pool = mp.Pool(processes=2,maxtasksperchild=1000)

    time.sleep(5)
    print "launching calculation"
    num_tasks = 1000
    tasks =  [pool.apply_async(calculate,(i,)) for i in range(num_tasks)]
    for f in tasks:    
        print f.get(5)
    print "calculation finished"
    time.sleep(10)
    print "closing  pool"
    pool.close()
    print "closed pool"
    print "joining pool"
    pool.join()
    print "joined pool"
    time.sleep(5)

这篇关于使用 Python 多处理的高内存使用率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆