Python多重处理和管理器 [英] Python multiprocessing and Manager

查看:73
本文介绍了Python多重处理和管理器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Python的multiprocessing创建并行应用程序.进程需要共享一些数据,为此我使用了Manager.但是,我有一些通用的功能,这些功能需要进程调用以及需要访问Manager对象存储的数据.我的问题是我是否可以避免,需要将Manager实例作为参数传递给这些通用函数,而是像全局变量一样使用它.换句话说,请考虑以下代码:

I am using Python's multiprocessing to create a parallel application. Processes need to share some data, for which I use a Manager. However, I have some common functions which processes need to call and which need to access the data stored by the Manager object. My question is whether I can avoid needing to pass the Manager instance to these common functions as an argument and rather use it like a global. In other words, consider the following code:

import multiprocessing as mp

manager = mp.Manager()
global_dict = manager.dict(a=[0])

def add():
    global_dict['a'] += [global_dict['a'][-1]+1]

def foo_parallel(var):
    add()
    print var

num_processes = 5
p = []
for i in range(num_processes):
    p.append(mp.Process(target=foo_parallel,args=(global_dict,)))

[pi.start() for pi in p]
[pi.join() for pi in p]

这可以正常运行,并在我的机器上返回p=[0,1,2,3,4,5].但是,这是好形式"吗?就像定义add(var)并调用add(var)一样好吗?

This runs fine and returns p=[0,1,2,3,4,5] on my machine. However, is this "good form"? Is this a good way to doing it, just as good as defining add(var) and calling add(var) instead?

推荐答案

您的代码示例似乎比表格具有更大的问题.您只有靠运气才能获得所需的输出.重复执行将产生不同的结果.这是因为+=不是原子操作.在任何一个进程更新之前,多个进程可以一个接一个地读取相同的旧值,并且它们将回写相同的值.为了防止这种行为,您必须另外使用Manager.Lock.

Your code example seems to have bigger problems than form. You get your desired output only with luck. Repeated execution will yield different results. That's because += is not an atomic operation. Multiple processes can read the same old value one after another, before any of them has updated it and they will write back the same values. To prevent this behaviour, you'll have to use a Manager.Lock additionally.

对于您最初有关良好形式"的问题.

To your original question about "good form".

IMO,让子进程foo_parallel的主要功能明确地将global_dict传递给泛型函数add(var)会更干净.这将是依赖注入的一种形式,并且具有一些优点.在您的示例中,并非详尽无遗:

IMO it would be cleaner, to let the main-function of the child process foo_parallel, pass global_dict explicitly into a generic function add(var). That would be a form of dependency injection and has some advantages. In your example non-exhaustively:

  • 允许隔离测试
  • 提高代码的可重用性
  • 更轻松的调试(在调用add之前,不应延迟检测受管理对象的不可访问性(快速失败)

  • allows isolated testing
  • increases code reusability
  • easier debugging (detecting non-accessibility of the managed object shouldn't be delayed until addis called (fail fast)

更少的样板代码(例如,需要多个功能的资源上的try-excepts块)

less boilerplate code (for example try-excepts blocks on resources multiple functions need)

作为旁注.仅将列表理解用于其副作用被认为是代码异味".如果不需要结果列表,则使用for循环.

As a side note. Using list comprehensions only for it's side effects is considered a 'code smell'. If you don't need a list as result, just use a for-loop.

代码:

import os
from multiprocessing import Process, Manager


def add(l):
    l += [l[-1] + 1]
    return l


def foo_parallel(global_dict, lock):
    with lock:
        l = global_dict['a']
        global_dict['a'] = add(l)
        print(os.getpid(), global_dict)


if __name__ == '__main__':

    N_WORKERS = 5

    with Manager() as manager:

        lock = manager.Lock()
        global_dict = manager.dict(a=[0])

        pool = [Process(target=foo_parallel, args=(global_dict, lock))
                for _ in range(N_WORKERS)]

        for p in pool:
            p.start()

        for p in pool:
            p.join()

        print('result', global_dict)

这篇关于Python多重处理和管理器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆