Python:如何在多处理池中使用值和数组 [英] Python: How to use Value and Array in Multiprocessing pool

查看:45
本文介绍了Python:如何在多处理池中使用值和数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于带有 Processmultiprocessing,我可以通过设置 args 参数来使用 Value, Array.

For multiprocessing with Process, I can use Value, Array by setting args param.

使用 multiprocessingPool,我如何使用 Value,Array. 文档中没有关于如何执行此操作的内容.

With multiprocessing with Pool, how can I use Value, Array. There is nothing in the docs on how to do this.

from multiprocessing import Process, Value, Array

def f(n, a):
    n.value = 3.1415927
    for i in range(len(a)):
        a[i] = -a[i]

if __name__ == '__main__':
    num = Value('d', 0.0)
    arr = Array('i', range(10))

    p = Process(target=f, args=(num, arr))
    p.start()
    p.join()

    print(num.value)
    print(arr[:])

我正在尝试在下面的代码片段中使用 Value, Array.

I am trying to use Value, Array within the code snippet below.

import multiprocessing


def do_calc(data):
    #  access num or 
    #  work to update arr
    newdata =data * 2
    return newdata

def start_process():
    print 'Starting', multiprocessing.current_process().name

if __name__ == '__main__':
    num             = Value('d', 0.0)
    arr             = Array('i', range(10))  
    inputs          = list(range(10))
    print 'Input   :', inputs

    pool_size       = multiprocessing.cpu_count() * 2
    pool            = multiprocessing.Pool(processes=pool_size,initializer=start_process, )
    pool_outputs    = pool.map(do_calc, inputs)
    pool.close() # no more tasks
    pool.join()  # wrap up current tasks

    print 'Pool    :', pool_outputs

推荐答案

我从来不知道这个的原因",但是 multiprocessing (mp) 使用不同的pickler/传递给大多数 Pool 方法的函数的 unpickler 机制.结果是由诸如 mp.Valuemp.Arraymp.Lock 之类的东西创建的对象不能被作为参数传递给这些方法,尽管它们可以作为参数传递给 mp.Process 给可选的 initializer mp.Pool() 的函数.由于后者,这有效:

I never knew "the reason" for this, but multiprocessing (mp) uses different pickler/unpickler mechanisms for functions passed to most Pool methods. It's a consequence that objects created by things like mp.Value, mp.Array, mp.Lock, ..., can't be passed as arguments to such methods, although they can be passed as arguments to mp.Process and to the optional initializer function of mp.Pool(). Because of the latter, this works:

import multiprocessing as mp

def init(aa, vv):
    global a, v
    a = aa
    v = vv

def worker(i):
    a[i] = v.value * i

if __name__ == "__main__":
    N = 10
    a = mp.Array('i', [0]*N)
    v = mp.Value('i', 3)
    p = mp.Pool(initializer=init, initargs=(a, v))
    p.map(worker, range(N))
    print(a[:])

然后打印

[0, 3, 6, 9, 12, 15, 18, 21, 24, 27]

这是我所知道的让它跨平台工作的唯一方法.

That's the only way I know of to get this to work across platforms.

在 Linux-y 平台上(其中 mp 通过 fork() 创建新进程),您可以改为创建您的 mp.Arraymp.Value(等)对象作为模块全局变量之前你做mp.Pool().fork() 创建的进程继承 mp.Pool() 执行时模块全局地址空间中的任何内容.

On Linux-y platforms (where mp creates new processes via fork()), you can instead create your mp.Array and mp.Value (etc) objects as module globals any time before you do mp.Pool(). Processes created by fork() inherit whatever is in the module global address space at the time mp.Pool() executes.

但这在不支持 fork() 的平台(阅读Windows")上根本不起作用.

But that doesn't work at all on platforms (read "Windows") that don't support fork().

这篇关于Python:如何在多处理池中使用值和数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆