Python:如何在多处理池中使用值和数组 [英] Python: How to use Value and Array in Multiprocessing pool

查看:86
本文介绍了Python:如何在多处理池中使用值和数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于带有Processmultiprocessing,我可以通过设置args参数来使用Value, Array.

For multiprocessing with Process, I can use Value, Array by setting args param.

multiprocessingPool之间,如何使用Value, Array.在文档中没有关于如何执行此操作的内容.

With multiprocessing with Pool, how can I use Value, Array. There is nothing in the docs on how to do this.

from multiprocessing import Process, Value, Array

def f(n, a):
    n.value = 3.1415927
    for i in range(len(a)):
        a[i] = -a[i]

if __name__ == '__main__':
    num = Value('d', 0.0)
    arr = Array('i', range(10))

    p = Process(target=f, args=(num, arr))
    p.start()
    p.join()

    print(num.value)
    print(arr[:])

我正在尝试在下面的代码片段中使用Value, Array.

I am trying to use Value, Array within the code snippet below.

import multiprocessing


def do_calc(data):
    #  access num or 
    #  work to update arr
    newdata =data * 2
    return newdata

def start_process():
    print 'Starting', multiprocessing.current_process().name

if __name__ == '__main__':
    num             = Value('d', 0.0)
    arr             = Array('i', range(10))  
    inputs          = list(range(10))
    print 'Input   :', inputs

    pool_size       = multiprocessing.cpu_count() * 2
    pool            = multiprocessing.Pool(processes=pool_size,initializer=start_process, )
    pool_outputs    = pool.map(do_calc, inputs)
    pool.close() # no more tasks
    pool.join()  # wrap up current tasks

    print 'Pool    :', pool_outputs

推荐答案

我从不知道原因",但是multiprocessing(mp)对传递给大多数Pool的函数使用不同的pickler/unpickler机制.方法.结果是,由mp.Valuemp.Arraymp.Lock,...之类的东西创建的对象不能作为参数传递给此类方法,尽管它们可以作为参数传递mp.Process mp.Pool()的可选initializer功能.由于后者,这行得通:

I never knew "the reason" for this, but multiprocessing (mp) uses different pickler/unpickler mechanisms for functions passed to most Pool methods. It's a consequence that objects created by things like mp.Value, mp.Array, mp.Lock, ..., can't be passed as arguments to such methods, although they can be passed as arguments to mp.Process and to the optional initializer function of mp.Pool(). Because of the latter, this works:

import multiprocessing as mp

def init(aa, vv):
    global a, v
    a = aa
    v = vv

def worker(i):
    a[i] = v.value * i

if __name__ == "__main__":
    N = 10
    a = mp.Array('i', [0]*N)
    v = mp.Value('i', 3)
    p = mp.Pool(initializer=init, initargs=(a, v))
    p.map(worker, range(N))
    print(a[:])

然后打印

[0, 3, 6, 9, 12, 15, 18, 21, 24, 27]

这是我知道的跨平台使用的唯一方法.

That's the only way I know of to get this to work across platforms.

在Linux-y平台(mp通过fork()创建新进程)上,您可以在之前mp.Array和mp.Value(等)对象作为模块全局对象. >您执行mp.Pool().由fork()创建的进程继承执行mp.Pool()时模块全局地址空间中的任何内容.

On Linux-y platforms (where mp creates new processes via fork()), you can instead create your mp.Array and mp.Value (etc) objects as module globals any time before you do mp.Pool(). Processes created by fork() inherit whatever is in the module global address space at the time mp.Pool() executes.

但是,在不支持fork()的平台(请阅读"Windows")上,这根本不起作用.

But that doesn't work at all on platforms (read "Windows") that don't support fork().

这篇关于Python:如何在多处理池中使用值和数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆