Python:如何在多处理池中使用值和数组 [英] Python: How to use Value and Array in Multiprocessing pool
问题描述
对于带有 Process
的 multiprocessing
,我可以通过设置 args
参数来使用 Value, Array
.
For multiprocessing
with Process
, I can use Value, Array
by setting args
param.
使用 multiprocessing
和 Pool
,我如何使用 Value,Array.
文档中没有关于如何执行此操作的内容.
With multiprocessing
with Pool
, how can I use Value, Array.
There is nothing in the docs on how to do this.
from multiprocessing import Process, Value, Array
def f(n, a):
n.value = 3.1415927
for i in range(len(a)):
a[i] = -a[i]
if __name__ == '__main__':
num = Value('d', 0.0)
arr = Array('i', range(10))
p = Process(target=f, args=(num, arr))
p.start()
p.join()
print(num.value)
print(arr[:])
我正在尝试在下面的代码片段中使用 Value, Array
.
I am trying to use Value, Array
within the code snippet below.
import multiprocessing
def do_calc(data):
# access num or
# work to update arr
newdata =data * 2
return newdata
def start_process():
print 'Starting', multiprocessing.current_process().name
if __name__ == '__main__':
num = Value('d', 0.0)
arr = Array('i', range(10))
inputs = list(range(10))
print 'Input :', inputs
pool_size = multiprocessing.cpu_count() * 2
pool = multiprocessing.Pool(processes=pool_size,initializer=start_process, )
pool_outputs = pool.map(do_calc, inputs)
pool.close() # no more tasks
pool.join() # wrap up current tasks
print 'Pool :', pool_outputs
推荐答案
我从来不知道这个的原因",但是 multiprocessing
(mp
) 使用不同的pickler/传递给大多数 Pool
方法的函数的 unpickler 机制.结果是由诸如 mp.Value
、mp.Array
、mp.Lock
之类的东西创建的对象不能被作为参数传递给这些方法,尽管它们可以作为参数传递给 mp.Process
和 给可选的 initializer
mp.Pool()
的函数.由于后者,这有效:
I never knew "the reason" for this, but multiprocessing
(mp
) uses different pickler/unpickler mechanisms for functions passed to most Pool
methods. It's a consequence that objects created by things like mp.Value
, mp.Array
, mp.Lock
, ..., can't be passed as arguments to such methods, although they can be passed as arguments to mp.Process
and to the optional initializer
function of mp.Pool()
. Because of the latter, this works:
import multiprocessing as mp
def init(aa, vv):
global a, v
a = aa
v = vv
def worker(i):
a[i] = v.value * i
if __name__ == "__main__":
N = 10
a = mp.Array('i', [0]*N)
v = mp.Value('i', 3)
p = mp.Pool(initializer=init, initargs=(a, v))
p.map(worker, range(N))
print(a[:])
然后打印
[0, 3, 6, 9, 12, 15, 18, 21, 24, 27]
这是我所知道的让它跨平台工作的唯一方法.
That's the only way I know of to get this to work across platforms.
在 Linux-y 平台上(其中 mp
通过 fork()
创建新进程),您可以改为创建您的 mp.Array
和mp.Value
(等)对象作为模块全局变量之前你做mp.Pool()
.fork()
创建的进程继承 mp.Pool()
执行时模块全局地址空间中的任何内容.
On Linux-y platforms (where mp
creates new processes via fork()
), you can instead create your mp.Array
and mp.Value
(etc) objects as module globals any time before you do mp.Pool()
. Processes created by fork()
inherit whatever is in the module global address space at the time mp.Pool()
executes.
但这在不支持 fork()
的平台(阅读Windows")上根本不起作用.
But that doesn't work at all on platforms (read "Windows") that don't support fork()
.
这篇关于Python:如何在多处理池中使用值和数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!