在这种情况下,多处理过程是否会复制对象? [英] Does multiprocessing copy the object in this scenario?

查看:68
本文介绍了在这种情况下,多处理过程是否会复制对象?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

import multiprocessing
import numpy as np
import multiprocessing as mp
import ctypes

class Test():
    def __init__(self):
        shared_array_base = multiprocessing.Array(ctypes.c_double, 100, lock=False)
        self.a = shared_array = np.ctypeslib.as_array(shared_array_base)

    def my_fun(self,i):
        self.a[i] = 1

if __name__ == "__main__":
    num_cores = multiprocessing.cpu_count()

    t = Test()

    def my_fun_wrapper(i):
        t.my_fun(i)

    with mp.Pool(num_cores) as p:
        p.map(my_fun_wrapper, np.arange(100))

    print(t.a)

在上面的代码中,我试图使用 multiprocessing 编写代码以修改数组.在每个进程中执行的函数 my_fun() 应该修改在索引 i 处传递的数组 a[:] 的值以my_fun()作为参数.关于上面的代码,我想知道正在复制什么.

In the code above, I'm trying to write a code to modify an array, using multiprocessing. The function my_fun(), executed in each process, should modify the value for the array a[:] at index i which is passed to my_fun() as a parameter. With regards to the code above, I would like to know what is being copied.

1)每个进程是否复制代码中的任何内容?我认为对象可能是,但理想情况下什么也不是.

1) Is anything in the code being copied by each process? I think the object might be but ideally nothing is.

2)有没有一种方法可以使用对象的包装函数my_fun()?

2) Is there a way to get around using a wrapper function my_fun() for the object?

推荐答案

除了使用multiprocessing.Array分配的共享内存外,几乎所有代码都将被复制. multiprocessing充满了不直观的隐式副本.

Almost everything in your code is getting copied, except the shared memory you allocated with multiprocessing.Array. multiprocessing is full of unintuitive, implicit copies.

multiprocessing中生成新进程时,新进程几乎需要原始进程中所有内容的自己的版本.根据平台和设置的不同,处理方式也有所不同,但是我们可以告诉您使用的是"fork"模式,因为您的代码无法在"spawn"或"forkserver"模式下工作-您会收到有关工作人员的错误信息,而不是能够找到my_fun_wrapper. (Windows仅支持"spawn",因此我们可以告诉您不在Windows上.)

When you spawn a new process in multiprocessing, the new process needs its own version of just about everything in the original process. This is handled differently depending on platform and settings, but we can tell you're using "fork" mode, because your code wouldn't work in "spawn" or "forkserver" mode - you'd get an error about the workers not being able to find my_fun_wrapper. (Windows only supports "spawn", so we can tell you're not on Windows.)

在"fork"模式下,此初始副本是通过使用fork系统调用来要求操作系统本质上复制整个过程以及内部所有内容的.由multiprocessing.Array分配的内存属于外部"内存,不会被复制,但其他大多数东西都是被复制的. (还有写时复制优化,但是写时复制的行为仍然像复制了所有内容一样,并且由于引用计数更新,该优化在Python中效果不佳.)

In "fork" mode, this initial copy is made by using the fork system call to ask the OS to essentially copy the whole entire process and everything inside. The memory allocated by multiprocessing.Array is sort of "external" and isn't copied, but most other things are. (There's also copy-on-write optimization, but copy-on-write still behaves as if everything was copied, and the optimization doesn't work very well in Python due to refcount updates.)

将任务分派到辅助进程时,multiprocessing需要制作更多副本.任何参数以及任务本身的可调用对象都是主流程中的对象,而对象固有地仅存在于一个流程中.工人们无法访问其中任何一个.他们需要自己的版本. multiprocessing通过腌制可调用对象和参数,通过进程间通信发送序列化的字节并在工作器中腌制腌菜来处理第二轮副本.

When you dispatch tasks to worker processes, multiprocessing needs to make even more copies. Any arguments, and the callable for the task itself, are objects in the master process, and objects inherently exist in only one process. The workers can't access any of that. They need their own versions. multiprocessing handles this second round of copies by pickling the callable and arguments, sending the serialized bytes over interprocess communication, and unpickling the pickles in the worker.

当主腌制my_fun_wrapper时,腌制器只说在__main__模块中查找my_fun_wrapper功能",然后工作人员会查找他们的my_fun_wrapper版本以对其进行腌制. my_fun_wrapper寻找全局t,并且在工作进程中,t是由fork生成的,并且fork生成了t,其数组由原始multiprocessing.Array分配的共享内存支持打电话.

When the master pickles my_fun_wrapper, the pickle just says "look for the my_fun_wrapper function in the __main__ module", and the workers look up their version of my_fun_wrapper to unpickle it. my_fun_wrapper looks for a global t, and in the workers, that t was produced by the fork, and the fork produced a t with an array backed by the shared memory you allocated with your original multiprocessing.Array call.

另一方面,如果尝试将t.my_fun传递给p.map,则multiprocessing必须腌制和释放方法对象.结果得到的泡菜没有说查找t全局变量并获取其my_fun方法".泡菜说要建立一个 new Test实例并获取 its my_fun方法.泡菜没有任何有关使用分配的共享内存的说明,并且生成的Test实例及其数组与您要修改的原始数组无关.

On the other hand, if you try to pass t.my_fun to p.map, then multiprocessing has to pickle and unpickle a method object. The resulting pickle doesn't say "look up the t global variable and get its my_fun method". The pickle says to build a new Test instance and get its my_fun method. The pickle doesn't have any instructions in it about using the shared memory you allocated, and the resulting Test instance and its array are independent of the original array you wanted to modify.

我知道没有什么好方法可以避免需要某种包装函数.

I know of no good way to avoid needing some sort of wrapper function.

这篇关于在这种情况下,多处理过程是否会复制对象?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆