防止池进程导入__main__和全局变量 [英] Preventing pool processes from importing __main__ and globals

查看:81
本文介绍了防止池进程导入__main__和全局变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用一个多进程工作池作为更大应用程序的一部分.因为我使用它来处理大量简单的数学运算,所以我有一个无共享的体系结构,在该体系结构中,工人需要的唯一变量作为参数传递.因此,我不需要worker子进程来导入任何全局变量,我的__main__模块或因此导入的任何模块.有没有办法强制这种行为并避免在生成池时造成性能下降?

I am using a multiprocessing pool of workers as a part of a much larger application. Since I use it for crunching a large volume of simple math, I have a shared-nothing architecture, where the only variables the workers ever need are passed on as arguments. Thus, I do not need the worker subprocesses to import any globals, my __main__ module or, consequently, any of the modules it imports. Is there any way to force such a behavior and avoid the performance hit when spawning the pool?

我应该注意,我的环境是Win32,它缺少os.fork(),并且通过对sys.executable的子过程调用(例如,启动一个新的Python进程)生成了工作进程",然后序列化了所有全局变量,然后通过管道将其发送." 根据此SO帖子.话虽这么说,但我想做的越少越好,以便我的泳池开放得更快.

I should note that my environment is Win32, which lacks os.fork() and the worker processes are spawned "using a subprocess call to sys.executable (i.e. start a new Python process) followed by serializing all of the globals, and sending those over the pipe." as per this SO post. This being said, I want to do as little of the above as possible so my pool opens faster.

有什么想法吗?

推荐答案

查看 multiprocessing.forking 实现,尤其是get_preparation_dataprepare(特定于Win32),全局变量不会被腌制.父进程的__main__的重新导入有点难看,但是除了顶层的,它不会运行任何代码.甚至没有if __name__ == '__main__'子句.因此,只需保持主模块没有导入时的副作用即可.

Looking at the multiprocessing.forking implementation, particularly get_preparation_data and prepare (win32-specific), globals aren't getting pickled. The reimport of the parent process's __main__ is a bit ugly, but it won't run any code except the one at the toplevel; not even if __name__ == '__main__' clauses. So just keep the main module without import-time side-effects.

您也可以防止子进程启动时主模块导入任何内容(仅在win32上有用,如您所述,它不能分叉).将main()及其导入移动到单独的模块中,以便启动脚本仅包含:

You can prevent your main module from importing anything when the subprocess starts, too (only useful on win32 which, as you note, can't fork). Move the main() and its imports to a separate module, so that the startup script contains only:

if '__name__' == '__main__':
    from mainmodule import main
    main()

子进程启动中仍然存在一个隐式import site.它具有重要的初始化功能,我认为mp.forking没有禁用它的简便方法,但是我并不认为它会很昂贵.

There is still an implicit import site in the child process startup. It does important initialisation and I don't think mp.forking has an easy way to disable it, but I don't expect it to be expensive anyway.

这篇关于防止池进程导入__main__和全局变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆