在python多处理工作者池中使用Initialize [英] Use of initialize in python multiprocessing worker pool

查看:89
本文介绍了在python多处理工作者池中使用Initialize的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究multiprocessing.Pool为工作人员,试图以某种状态初始化工作人员.池可以接受可调用的初始化,但不会传递对初始化的工作程序的引用.我见过的几个利用它的示例称为全局变量,这看起来确实很讨厌.

I was looking into the multiprocessing.Pool for workers, trying to initialize workers with some state. The pool can take a callable, initialize, but it isn't passed a reference to the initialized worker. The few example that I've seen utilize it call global variables, which seems really nasty.

是否有使用multiprocessing.Pool初始化工作程序状态的好方法?

Is there any good way to initialize worker state using multiprocessing.Pool?

一个例子:

我有一些工人,每个工人都进行一些相对昂贵的初始化(绑定到套接字),而我不想每次都要做.我可以手工初始化我的套接字,然后在分配工作时将它们传递给我,但是跨进程共享文件描述符非常复杂,即使不是不可能.因此,每次要处理请求时,我都必须进行初始化和绑定.

I have workers, each of which do a bit relatively expensive initialisation (binding to a socket), which I don't want to have to do every time. I could initialize my sockets by hand, then pass them in when I assign work, but sharing file descriptors across processes is complicated, if not impossible. So I would have to initialize and bind every time I wanted to process a request.

推荐答案

从技术上讲,正确的做法是将初始化函数的结果作为参数传递给工作人员执行的每个函数.

Technically speaking, the right thing to do would be having the result of the initialization function passed as argument to every function executed by the worker.

在这种情况下,具有全局变量也很好,也很安全,因为通过构造它们会导致私有对象生活在不同进程的单独域中.

It's also true that in this context is fine and safe to have global variables, since by construction they result private objects living in the separate domains of different processes.

我的一般建议是使用健全的 reentant 编程风格来构建函数,并允许在使用multiprocessing功能时使用全局变量.

My general suggestion is to build functions with a sane reentrant programming style, and to allow global variables while exploiting the multiprocessing functionality.

以您的示例为例,以下send函数需要一些上下文(在本例中为 socket ):

Keeping your example, the following send function requires some context (in this case, a socket):

def send(socket, data):
    pass # ... your code here
    return dust

为方便起见,工作人员执行的初始化代码和基本代码将依赖于全局变量.

The initialization code and the base code executed by the worker will rely on global variables for convenience.

socket = None
def init(address, port):
    global socket
    socket = magic(address, port)

def job(data):
    global socket
    assert socket is not None
    return send(socket, data)

pool = multithreading.Pool(N, init, [address, port])
pool.map(job, ['foo', 'bar', 'baz'])

通过这种方式进行编码,无需进行多处理即可对其进行测试,变得自然而自然.您可以将全局状态视为完全安全的上下文容器.

By coding it in this way it gets simple and natural to test it without multiprocessing. You can think of your global state as a perfectly safe context capsule.

为方便起见,请记住,multiprocessing不能很好地发送周围的复杂数据(例如回调).最好的方法是发送简单的数据(字符串,列表,字典,collections.namedtuple ...)并在工作端重建复杂的数据结构(使用初始化函数).

As additional point of convenience, keep in mind that multiprocessing is not very good at sending complex data around (e.g. callbacks). The best approach is sending simple pieces of data (strings, lists, dictionaries, collections.namedtuple ...) and reconstruct the complex data structures on the worker side (using the initialization function).

这篇关于在python多处理工作者池中使用Initialize的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆