使用Python多处理的高内存使用率 [英] High Memory Usage Using Python Multiprocessing
问题描述
我已经看过几篇关于使用Python Multiprocessing模块的内存使用的帖子.但是,这些问题似乎无法解决我在这里遇到的问题.我正在发布我的分析报告,希望有人能帮助我.
I have seen a couple of posts on memory usage using Python Multiprocessing module. However the questions don't seem to answer the problem I have here. I am posting my analysis with the hope that some one can help me.
我正在使用多处理并行执行任务,并且我注意到工作进程的内存消耗会无限期地增长.我有一个小的独立示例,应该复制我注意到的内容.
I am using multiprocessing to perform tasks in parallel and I noticed that the memory consumption by the worker processes grow indefinitely. I have a small standalone example that should replicate what I notice.
import multiprocessing as mp
import time
def calculate(num):
l = [num*num for num in range(num)]
s = sum(l)
del l # delete lists as an option
return s
if __name__ == "__main__":
pool = mp.Pool(processes=2)
time.sleep(5)
print "launching calculation"
num_tasks = 1000
tasks = [pool.apply_async(calculate,(i,)) for i in range(num_tasks)]
for f in tasks:
print f.get(5)
print "calculation finished"
time.sleep(10)
print "closing pool"
pool.close()
print "closed pool"
print "joining pool"
pool.join()
print "joined pool"
time.sleep(5)
系统
我正在运行Windows,并且使用任务管理器监视内存使用情况.我正在运行Python 2.7.6.
System
I am running Windows and I use the task manager to monitor the memory usage. I am running Python 2.7.6.
我总结了以下2个工作进程的内存消耗.
I have summarized the memory consumption by the 2 worker processes below.
+---------------+----------------------+----------------------+
| num_tasks | memory with del | memory without del |
| | proc_1 | proc_2 | proc_1 | proc_2 |
+---------------+----------------------+----------------------+
| 1000 | 4884 | 4694 | 4892 | 4952 |
| 5000 | 5588 | 5596 | 6140 | 6268 |
| 10000 | 6528 | 6580 | 6640 | 6644 |
+---------------+----------------------+----------------------+
在上表中,我尝试更改任务数,并观察所有计算结束时和join
-ing pool
之前消耗的内存. 'del'和'without del'选项分别是我取消注释还是注释calculate(num)
函数内的del l
行.在计算之前,内存消耗约为4400.
In the table above, I tried to change the number of tasks and observe the memory consumed at the end of all calculation and before join
-ing the pool
. The 'del' and 'without del' options are whether I un-comment or comment the del l
line inside the calculate(num)
function respectively. Before calculation, the memory consumption is around 4400.
- 似乎手动清除列表会减少工作进程的内存使用量.我以为垃圾收集器会照顾好这个.有没有办法强制垃圾收集?
- 令人困惑的是,随着任务数量的增加,两种情况下的内存使用量都在不断增长.有没有办法限制内存使用量?
我有一个基于此示例的过程,并且应长期运行.我观察到该工作进程在一夜之间运行后占用了大量内存(〜4GB).进行join
释放内存不是一种选择,我正在尝试找出一种没有join
-ing的方法.
I have a process that is based on this example, and is meant to run long term. I observe that this worker processes are hogging up lots of memory(~4GB) after an overnight run. Doing a join
to release memory is not an option and I am trying to figure out a way without join
-ing.
这似乎有点神秘.有没有人遇到过类似的事情?我该如何解决这个问题?
This seems a little mysterious. Has anyone encountered something similar? How can I fix this issue?
推荐答案
我做了很多研究,但找不到解决问题的方法.但是有一个不错的解决方法,可以以较小的成本防止内存爆裂,尤其是在服务器端长时间运行的代码上更值得.
I did a lot of research, and couldn't find a solution to fix the problem per se. But there is a decent work around that prevents the memory blowout for a small cost, worth especially on server side long running code.
解决方案实质上是在固定数量的任务之后重新启动各个工作进程. python中的Pool
类将maxtasksperchild
作为参数.您可以指定maxtasksperchild=1000
,从而限制在每个子进程上运行1000个任务.达到maxtasksperchild
编号后,池将刷新其子进程.使用谨慎的数量执行最大的任务,可以平衡消耗的最大内存,以及与重新启动后端进程相关的启动成本. Pool
的构建过程如下:
The solution essentially was to restart individual worker processes after a fixed number of tasks. The Pool
class in python takes maxtasksperchild
as an argument. You can specify maxtasksperchild=1000
thus limiting 1000 tasks to be run on each child process. After reaching the maxtasksperchild
number, the pool refreshes its child processes. Using a prudent number for maximum tasks, one can balance the max memory that is consumed, with the start up cost associated with restarting back-end process. The Pool
construction is done as :
pool = mp.Pool(processes=2,maxtasksperchild=1000)
我将完整的解决方案放在这里,以便其他人可以使用!
I am putting my full solution here so it can be of use to others!
import multiprocessing as mp
import time
def calculate(num):
l = [num*num for num in range(num)]
s = sum(l)
del l # delete lists as an option
return s
if __name__ == "__main__":
# fix is in the following line #
pool = mp.Pool(processes=2,maxtasksperchild=1000)
time.sleep(5)
print "launching calculation"
num_tasks = 1000
tasks = [pool.apply_async(calculate,(i,)) for i in range(num_tasks)]
for f in tasks:
print f.get(5)
print "calculation finished"
time.sleep(10)
print "closing pool"
pool.close()
print "closed pool"
print "joining pool"
pool.join()
print "joined pool"
time.sleep(5)
这篇关于使用Python多处理的高内存使用率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!