涉及C对象的Python多处理共享内存问题 [英] Python multiprocessing shared memory issues with C objects involved

查看:170
本文介绍了涉及C对象的Python多处理共享内存问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发一个程序,该程序使用外部C库来解析来自外部源的数据,并使用Python库在其上运行一些优化问题。优化非常耗时,因此使用多个CPU将是一个重要的优势。

I am working on a program that uses an external C library to parse data from external sources and a Python library to run some optimisation problem on it. The optimisation is very time consuming so using several CPU would be a significant plus.

基本上,我用Cython封装了C(++)结构,如下所示:

Basically, I wrapped the C(++) structures with Cython as follows:

cdef class CObject(object):

    cdef long p_sthg
    cdef OBJECT* sthg

    def __cinit__(self, sthg):
        self.p_sthg = sthg
        self.sthg = <OBJECT*> self.p_sthg

    def __reduce__(self):
        return (rebuildObject, (self.p_sthg, ))

    def getManyThings(self):
        ...
        return blahblahblah

然后我创建了资源密集型过程:

Then I create my resource intensive process:

p = mp.Process(target=make_process, args=((cobject,)))

您可以立即猜测(当然我没有),即使我设法解开CObject,该指针也会传递给新进程,

As you can immediately guess (of course I didn't), even though I manage to unpickle the CObject, the pointer is passed to the new process, but not the C structure it refers to.

我可以找到一些资源来解释如何将Python对象放入共享内存中,但是对于我来说这还不够。我需要在Python进程之间共享几乎不了解的C对象(以及顶部CObject指向的其他对象)。

I can find some resources explaining how to put Python objects into shared memory, but that would not be sufficient in my case, as I would need to share C objects I barely know about (and other objects that are pointed at by the top CObject) between the Python processes.

好事是我可以通过只读访问生存下来。

In case it matters, the good thing is that I can survive with a read-only access...

有人有经验吗?

我的另一个想法是找到一种方法来写入需要传递到文件中的对象的二进制表示形式,并从其他过程中读取它……

Does anyone have any experience in such matter?
My other idea would be to find a way to write the binary representation of the object I need to pass into file and read it from the other process...

推荐答案

没有一种通用的方法。

您可以将C通过在合适的 mmap(2) 区域(也可以通过 mmap 在Python标准库中;使用 MAP_SHARED | MAP_ANONYMOUS )。这要求整个对象都位于mmap内,并且可能使该对象无法使用指针(但是,如果它们指向mmap,则相对于该对象的偏移可能是可以的)。如果对象具有任何文件描述符或其他任何类型的句柄,则几乎可以肯定它们将无法正常工作。请注意, mmap()就像 malloc()一样;您必须执行相应的 munmap()或泄漏内存。

You can put the C object into shared memory by constructing it inside a suitable mmap(2) region (also available via mmap in the Python standard library; use MAP_SHARED|MAP_ANONYMOUS). This requires the entire object to lie within the mmap, and will likely make it impossible for the object to use pointers (but offsets relative to the object are probably OK provided they point within the mmap). If the object has any file descriptors or other handles of any kind, those will almost certainly not work correctly. Note that mmap() is like malloc(); you have to do a corresponding munmap() or you leak the memory.

您可以复制将C对象放入共享内存中(例如, memcpy(3) )。这可能效率较低,并且需要合理复制对象。 memcpy 不能神奇地修复指针和其他引用。从好的方面来说,这不需要您控制对象的构造。

You could copy the C object into shared memory (with e.g. memcpy(3)). This is likely less efficient, and requires the object to be reasonably copiable. memcpy does not magically fix up pointers and other references. On the plus side, this does not require you to control the object's construction.

您可以将对象序列化为某种二进制表示形式,并将其通过 pipe(2) (也可以通过<$ c获得) $ c> os.pipe()在Python中)。对于简单的情况,这是一个逐字段的复制,但是同样,指针将需要注意。您必须(取消)摆动指针,以使其在(反序列化)之后正常工作。这是最容易推广的技术,但是需要了解对象的结构或为您进行序列化的黑盒功能。

You can serialize the object to some binary representation and pass it through a pipe(2) (also available via os.pipe() in Python). For simple cases, this is a field-by-field copy, but again, pointers will need care. You will have to (un)swizzle your pointers to make them work correctly after (de)serialization. This is the most easily generalized technique, but requires knowledge of how the object is structured, or a black-box function that does the serialization for you.

最后,您可以在 / dev / shm 中创建临时文件并以这种方式交换信息。这些文件由RAM支持,实际上与共享内存相同,但可能具有更熟悉的类似文件的界面。但这仅适用于Unix。在Linux以外的系统上,应该使用 shm_open(3) 以实现完全可移植性。

Finally, you can create temporary files in /dev/shm and exchange information that way. These files are backed by RAM and are effectively the same as shared memory, but with perhaps a more familiar file-like interface. But this is Unix-only. On systems other than Linux, you should use shm_open(3) for full portability.

请注意,共享内存通常会出现问题。它需要进程间的同步,但是与线程世界相比,必需的锁定原语远没有得到开发。我建议将共享内存限制为不可变的对象或固有的无锁设计(很难正确实现)。

Note that shared memory, in general, tends to be problematic. It requires inter-process synchronization, but the necessary locking primitives are far less developed than in the threading world. I recommend limiting shared memory to immutable objects or inherently lock-free designs (which are quite difficult to get right).

这篇关于涉及C对象的Python多处理共享内存问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆