使用只读共享内存的Python中的多处理? [英] Multiprocessing in Python with read-only shared memory?

查看:52
本文介绍了使用只读共享内存的Python中的多处理?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个单线程Python程序,我想对其进行修改以利用运行它的服务器上的所有32个处理器.正如我所设想的那样,每个工作进程都将从队列中接收其作业,并将其输出提交到队列中.但是,要完成其工作,每个工作进程都需要对复杂的内存数据结构(相互链接的数十亿字节的字典和对象)进行只读访问.在python中,是否有一种简单的方法可以共享此数据结构,而无需为每个工作进程复制该数据结构?

I have a single-threaded Python program, and I'd like to modify it to make use of all 32 processors on the server it runs on. As I envision it, each worker process would receive its job from a queue and submit its output to a queue. To complete its work, however, each worker process would need read-only access to a complex in-memory data structure--many gigabytes of dicts and objects that link to each other. In python, is there a simple way to share this data structure, without making a copy of it for each worker process?

谢谢.

推荐答案

如果您使用的是Python的CPython(或PyPy)实现,则

If you are using the CPython (or PyPy) implementation of Python, then the global interpreter lock (GIL) will prevent more than one thread from operating on Python objects at a time.

因此,如果您使用这样的实现,则需要使用多个进程而不是多个线程来利用32个处理器.

So if you are using such an implementation, you'll need to use multiple processes instead of multiple threads to take advantage of your 32 processors.

您可以使用标准库的多重处理许多第三方选项. Doug Hellman的教程是对多处理模块的很好介绍.

You could use the the standard library's multiprocessing or concurrent.futures modules to spawn the worker processes. There are also many third-party options. Doug Hellman's tutorial is a great introduction to the multiprocessing module.

由于您仅需要对数据结构的只读访问权限,因此,如果在生成进程之前将复杂的数据结构分配给全局变量 ,则所有进程都可以访问此全局结构变量.

Since you only need read-only access to the data structure, if you assign the complex data structure to a global variable before you spawn the processes, then all the processes will have access to this global variable.

在生成进程时,将来自调用模块的全局变量复制到生成的进程.但是,在具有复制时复制的Linux上,完全相同的数据结构是由生成的进程使用,因此不需要额外的内存.仅当进程修改数据结构时,才会将其复制到新位置.

When you spawn a process, the globals from the calling module are copied to the spawned process. However, on Linux, which has copy-on-write, the very same data structure(s) is used by the spawned processes, so no extra memory is required. Only when a process modifies the data structure is it copied to a new location.

在Windows上,由于没有fork,所以每个生成的进程都会调用python并重新导入调用模块,因此每个进程都需要内存来存储其自己的庞大数据结构的单独副本.在Windows上必须有其他共享数据结构的方法,但是我不知道细节. ( POSH可能是共享内存问题的解决方案,但我自己还没有尝试过.)

On Windows, since there is no fork, each spawned process calls python and re-imports the calling module, so each process requires memory for its own separate copy of the huge data structure. There must be some other way to share data structures on Windows, but I'm unaware of the details. ( POSH may be a solution to the shared-memory problem, but I haven't tried it myself.)

这篇关于使用只读共享内存的Python中的多处理?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆