Python:如何在多处理池中使用值和数组 [英] Python: How to use Value and Array in Multiprocessing pool
问题描述
对于带有Process
的multiprocessing
,我可以通过设置args
参数来使用Value, Array
.
For multiprocessing
with Process
, I can use Value, Array
by setting args
param.
在multiprocessing
和Pool
之间,如何使用Value, Array.
在文档中没有关于如何执行此操作的内容.
With multiprocessing
with Pool
, how can I use Value, Array.
There is nothing in the docs on how to do this.
from multiprocessing import Process, Value, Array
def f(n, a):
n.value = 3.1415927
for i in range(len(a)):
a[i] = -a[i]
if __name__ == '__main__':
num = Value('d', 0.0)
arr = Array('i', range(10))
p = Process(target=f, args=(num, arr))
p.start()
p.join()
print(num.value)
print(arr[:])
我正在尝试在下面的代码片段中使用Value, Array
.
I am trying to use Value, Array
within the code snippet below.
import multiprocessing
def do_calc(data):
# access num or
# work to update arr
newdata =data * 2
return newdata
def start_process():
print 'Starting', multiprocessing.current_process().name
if __name__ == '__main__':
num = Value('d', 0.0)
arr = Array('i', range(10))
inputs = list(range(10))
print 'Input :', inputs
pool_size = multiprocessing.cpu_count() * 2
pool = multiprocessing.Pool(processes=pool_size,initializer=start_process, )
pool_outputs = pool.map(do_calc, inputs)
pool.close() # no more tasks
pool.join() # wrap up current tasks
print 'Pool :', pool_outputs
推荐答案
我从不知道原因",但是multiprocessing
(mp
)对传递给大多数Pool
的函数使用不同的pickler/unpickler机制.方法.结果是,由mp.Value
,mp.Array
,mp.Lock
,...之类的东西创建的对象不能作为参数传递给此类方法,尽管它们可以作为参数传递mp.Process
和到mp.Pool()
的可选initializer
功能.由于后者,这行得通:
I never knew "the reason" for this, but multiprocessing
(mp
) uses different pickler/unpickler mechanisms for functions passed to most Pool
methods. It's a consequence that objects created by things like mp.Value
, mp.Array
, mp.Lock
, ..., can't be passed as arguments to such methods, although they can be passed as arguments to mp.Process
and to the optional initializer
function of mp.Pool()
. Because of the latter, this works:
import multiprocessing as mp
def init(aa, vv):
global a, v
a = aa
v = vv
def worker(i):
a[i] = v.value * i
if __name__ == "__main__":
N = 10
a = mp.Array('i', [0]*N)
v = mp.Value('i', 3)
p = mp.Pool(initializer=init, initargs=(a, v))
p.map(worker, range(N))
print(a[:])
然后打印
[0, 3, 6, 9, 12, 15, 18, 21, 24, 27]
这是我知道的跨平台使用的唯一方法.
That's the only way I know of to get this to work across platforms.
在Linux-y平台(mp
通过fork()
创建新进程)上,您可以在之前mp.Array和mp.Value
(等)对象作为模块全局对象. >您执行mp.Pool()
.由fork()
创建的进程继承执行mp.Pool()
时模块全局地址空间中的任何内容.
On Linux-y platforms (where mp
creates new processes via fork()
), you can instead create your mp.Array
and mp.Value
(etc) objects as module globals any time before you do mp.Pool()
. Processes created by fork()
inherit whatever is in the module global address space at the time mp.Pool()
executes.
但是,在不支持fork()
的平台(请阅读"Windows")上,这根本不起作用.
But that doesn't work at all on platforms (read "Windows") that don't support fork()
.
这篇关于Python:如何在多处理池中使用值和数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!