带有全局数据的python并行映射(multiprocessing.Pool.map) [英] python parallel map (multiprocessing.Pool.map) with global data
问题描述
我正在尝试在多个进程上调用一个函数.显而易见的解决方案是python的multiprocessing
模块.问题在于该功能有副作用.它创建一个临时文件,并使用atexit.register
和全局列表注册要在退出时删除的文件.下面应该演示该问题(在不同的上下文中).
I'm trying to call a function on multiple processes. The obvious solution is python's multiprocessing
module. The problem is that the function has side effects. It creates a temporary file and registers that file to be deleted on exit using the atexit.register
and a global list. The following should demonstrate the problem (in a different context).
import multiprocessing as multi
glob_data=[]
def func(a):
glob_data.append(a)
map(func,range(10))
print glob_data #[0,1,2,3,4 ... , 9] Good.
p=multi.Pool(processes=8)
p.map(func,range(80))
print glob_data #[0,1,2,3,4, ... , 9] Bad, glob_data wasn't updated.
有什么办法可以更新全局数据?
Is there any way to have the global data updated?
请注意,如果您尝试上述脚本,则可能不应该从交互式解释器中尝试它,因为multiprocessing
要求模块__main__
可由子进程导入.
Note that if you try out the above script, you probably shouldn't try it from the interactive interpreter since multiprocessing
requires the module __main__
to be importable by child processes.
更新
在func中添加global
关键字无济于事-例如:
Added the global
keyword in func doesn't help -- e.g.:
def func(a): #Still doesn't work.
global glob_data
glob_data.append(a)
推荐答案
您需要列表glob_data
由共享内存支持,Multiprocessing的Manager仅提供以下内容:
You need the list glob_data
to be backed by shared memory, Multiprocessing's Manager gives you just that:
import multiprocessing as multi
from multiprocessing import Manager
manager = Manager()
glob_data = manager.list([])
def func(a):
glob_data.append(a)
map(func,range(10))
print glob_data # [0,1,2,3,4 ... , 9] Good.
p = multi.Pool(processes=8)
p.map(func,range(80))
print glob_data # Super Good.
对于某些背景:
https://docs.python.org/3/library/multiprocessing .html#managers
这篇关于带有全局数据的python并行映射(multiprocessing.Pool.map)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!