内存使用量随着 Python 的 multiprocessing.pool 不断增长 [英] Memory usage keep growing with Python's multiprocessing.pool
问题描述
这是程序:
#!/usr/bin/python
import multiprocessing
def dummy_func(r):
pass
def worker():
pass
if __name__ == '__main__':
pool = multiprocessing.Pool(processes=16)
for index in range(0,100000):
pool.apply_async(worker, callback=dummy_func)
# clean up
pool.close()
pool.join()
我发现内存使用量(VIRT 和 RES)一直在增长,直到 close()/join(),有什么解决方案可以摆脱这种情况吗?我用 2.7 尝试了 maxtasksperchild,但也没有帮助.
I found memory usage (both VIRT and RES) kept growing up till close()/join(), is there any solution to get rid of this? I tried maxtasksperchild with 2.7 but it didn't help either.
我有一个更复杂的程序,它调用 apply_async() ~6M 次,在 ~1.5M 点我已经有 6G+ RES,为了避免所有其他因素,我将程序简化为以上版本.
I have a more complicated program that calles apply_async() ~6M times, and at ~1.5M point I've already got 6G+ RES, to avoid all other factors, I simplified the program to above version.
原来这个版本效果更好,感谢大家的投入:
Turned out this version works better, thanks for everyone's input:
#!/usr/bin/python
import multiprocessing
ready_list = []
def dummy_func(index):
global ready_list
ready_list.append(index)
def worker(index):
return index
if __name__ == '__main__':
pool = multiprocessing.Pool(processes=16)
result = {}
for index in range(0,1000000):
result[index] = (pool.apply_async(worker, (index,), callback=dummy_func))
for ready in ready_list:
result[ready].wait()
del result[ready]
ready_list = []
# clean up
pool.close()
pool.join()
我没有在那里放任何锁,因为我相信主进程是单线程的(根据我阅读的文档,回调或多或少像一个事件驱动的东西).
I didn't put any lock there as I believe main process is single threaded (callback is more or less like a event-driven thing per docs I read).
我将 v1 的索引范围更改为 1,000,000,与 v2 相同并做了一些测试 - 我觉得很奇怪 v2 甚至比 v1 快约 10%(33 秒 vs 37 秒),也许 v1 做了太多的内部列表维护工作.v2绝对是内存使用的赢家,它从来没有超过300M(VIRT)和50M(RES),而v1曾经是370M/120M,最好的是330M/85M.所有数字均为3~4次测试,仅供参考.
I changed v1's index range to 1,000,000, same as v2 and did some tests - it's weird to me v2 is even ~10% faster than v1 (33s vs 37s), maybe v1 was doing too many internal list maintenance jobs. v2 is definitely a winner on memory usage, it never went over 300M (VIRT) and 50M (RES), while v1 used to be 370M/120M, the best was 330M/85M. All numbers were just 3~4 times testing, reference only.
推荐答案
我最近遇到了内存问题,因为我多次使用多处理函数,所以它不断生成进程,并将它们留在内存中.
I had memory issues recently, since I was using multiple times the multiprocessing function, so it keep spawning processes, and leaving them in memory.
这是我现在使用的解决方案:
Here's the solution I'm using now:
def myParallelProcess(ahugearray):
from multiprocessing import Pool
from contextlib import closing
with closing(Pool(15)) as p:
res = p.imap_unordered(simple_matching, ahugearray, 100)
return res
这篇关于内存使用量随着 Python 的 multiprocessing.pool 不断增长的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!