通过全局名称空间与作为函数参数共享同步对象 [英] Sharing synchronization objects through global namespace vs as a function argument

查看:75
本文介绍了通过全局名称空间与作为函数参数共享同步对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我需要共享multiprocessing.Queuemultiprocessing.Manager(或任何其他同步原语),则在全局(模块)级别定义它们,而不是将它们作为在不同过程中执行的函数的参数?

If I need to share a multiprocessing.Queue or a multiprocessing.Manager (or any of the other synchronization primitives), is there any difference in doing it by defining them at the global (module) level, versus passing them as an argument to the function executed in a different process?

例如,以下三种我可以想象可以共享队列的方式:

For example, here are three possible ways I can imagine a queue could be shared:

# works fine on both Windows and Linux
from multiprocessing import Process, Queue

def f(q):
    q.put([42, None, 'hello'])

def main():
    q = Queue()
    p = Process(target=f, args=(q,))
    p.start()
    print(q.get())    # prints "[42, None, 'hello']"
    p.join()

if __name__ == '__main__':
    main()

vs.

# works fine on Linux, hangs on Windows
from multiprocessing import Process, Queue
q = Queue()

def f():
    q.put([42, None, 'hello'])

def main():
    p = Process(target=f)
    p.start()
    print(q.get())    # prints "[42, None, 'hello']"
    p.join()

if __name__ == '__main__':
    main()

vs.

# works fine on Linux, NameError on Windows
from multiprocessing import Process, Queue

def f():
    q.put([42, None, 'hello'])

def main():
    p = Process(target=f)
    p.start()
    print(q.get())    # prints "[42, None, 'hello']"
    p.join()

if __name__ == '__main__':
    q = Queue()
    main()

哪种方法正确?我从实验中猜测这只是第一个,但想确认它是正式情况(不仅对于Queue,而且对于Manager和其他类似对象).

Which the correct approach? I'm guessing from my experimentation that it's only the first one, but wanted to confirm it's officially the case (and not only for Queue but for Manager and other similar objects).

推荐答案

明确将资源传递给子进程

Explicitly pass resources to child processes

在使用fork start方法的Unix上,子进程可以使用在使用全局资源的父进程中创建的共享资源.但是,最好将对象作为参数传递给子进程的构造函数.

On Unix using the fork start method, a child process can make use of a shared resource created in a parent process using a global resource. However, it is better to pass the object as an argument to the constructor for the child process.

除了使代码(可能)与Windows和其他启动方法兼容之外,这还确保只要子进程仍然存在,就不会在父进程中垃圾收集对象.如果在父进程中垃圾回收对象时释放了一些资源,这可能很重要.

Apart from making the code (potentially) compatible with Windows and the other start methods this also ensures that as long as the child process is still alive the object will not be garbage collected in the parent process. This might be important if some resource is freed when the object is garbage collected in the parent process.

问题是spawn/forkserver(Windows仅支持spawn)在后台运行的方式.它没有使用内存和文件描述符克隆父进程,而是从头创建了一个新进程.然后加载一个新的Python解释器,传递要导入的模块并启动它.显然,这意味着您的全局变量将是一个全新的Queue,而不是父级的Queue.

The issue is the way the spawn/forkserver (Windows only supports spawn) works under the hood. Instead of cloning the parent process with its memory and files desciptors, it creates a new process from the ground. It then loads a new Python interpreter passing the modules to import and launches it. This obviously means your global variable will be a brand new Queue instead of the parent's one.

另一个含义是,要传递给新流程的对象必须是可腌制的,因为它们将通过管道传递.

Another implication is that the objects you want to pass to the new process must be pickleable as they will be passed through a pipe.

这篇关于通过全局名称空间与作为函数参数共享同步对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆