在没有继承的情况下在多个进程之间共享numpy数组 [英] Sharing numpy arrays between multiple processes without inheritance

查看:68
本文介绍了在没有继承的情况下在多个进程之间共享numpy数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在多个进程之间共享 numpy 数组.在这里有有效的解决方案.但是它们都通过继承将数组传递给子进程,这对我不起作用,因为我必须事先启动几个工作进程,而且我不知道以后要处理多少个数组.有没有办法在进程启动后创建这样的数组,并通过队列将这些数组传递给进程?

顺便说一句,由于某种原因,我无法使用 multiprocessing.Manager.

解决方案

你应该使用 共享内存,正好解决了您的用例.您保持内存读写速度,所有进程都可以在共享内存中的数组中读写,而不会产生任何序列化或传输成本.

以下是来自官方python文档的示例:

<预><代码>>>># 在第一个 Python 交互式 shell 中>>>将 numpy 导入为 np>>>a = np.array([1, 1, 2, 3, 5, 8]) # 从现有的 NumPy 数组开始>>>从多处理导入 shared_memory>>>shm = shared_memory.SharedMemory(create=True, size=a.nbytes)>>># 现在创建一个由共享内存支持的 NumPy 数组>>>b = np.ndarray(a.shape, dtype=a.dtype, buffer=shm.buf)>>>b[:] = a[:] # 将原始数据复制到共享内存中>>>乙数组([1, 1, 2, 3, 5, 8])>>>类型(b)<类'numpy.ndarray'>>>>类型(一)<类'numpy.ndarray'>>>>shm.name # 我们没有指定一个名字,所以为我们选择了一个'psm_21467_46075'

<预><代码>>>># 在同一 shell 或同一台机器上的新 Python shell 中>>>将 numpy 导入为 np>>>从多处理导入 shared_memory>>># 附加到现有的共享内存块>>>existing_shm = shared_memory.SharedMemory(name='psm_21467_46075')>>># 注意本例中a.shape是(6,),a.dtype是np.int64>>>c = np.ndarray((6,), dtype=np.int64, buffer=existing_shm.buf)>>>C数组([1, 1, 2, 3, 5, 8])>>>c[-1] = 888>>>C数组([ 1, 1, 2, 3, 5, 888])

<预><代码>>>># 回到第一个 Python 交互式 shell,b 反映了这个变化>>>乙数组([ 1, 1, 2, 3, 5, 888])

<预><代码>>>># 从第二个 Python shell 中清理>>>del c # 不必要的;不再使用仅仅强调数组>>>existing_shm.close()

<预><代码>>>># 从第一个 Python shell 中清理>>>del b # 不必要的;不再使用仅仅强调数组>>>shm.close()>>>shm.unlink() # 最后释放并释放共享内存块

对于像您这样的实际用例,您需要使用 Pipe 或任何其他多处理通信机制传递名称 shm.name.请注意,只有这个小字符串需要在进程之间交换;实际数据保留在共享内存空间中.

I would like to share numpy arrays between multiple processes. There are working solutions here. However they all pass the arrays to the child process through inheritance, which does not work for me because I have to start a few worker processes beforehand and I don't know how many arrays I'm going to deal with later on. Is there any way to create such arrays after the process is started and pass these arrays to the processes via queues?

Btw for some reason I'm not able to use multiprocessing.Manager.

解决方案

You should use shared memory, which exactly solve your use case. You keep memory read/write speed, and all processes can read and write in the array in shared memory without incurring any serialization or transport cost.

Below is the example from the official python doc:

>>> # In the first Python interactive shell
>>> import numpy as np
>>> a = np.array([1, 1, 2, 3, 5, 8])  # Start with an existing NumPy array
>>> from multiprocessing import shared_memory
>>> shm = shared_memory.SharedMemory(create=True, size=a.nbytes)
>>> # Now create a NumPy array backed by shared memory
>>> b = np.ndarray(a.shape, dtype=a.dtype, buffer=shm.buf)
>>> b[:] = a[:]  # Copy the original data into shared memory
>>> b
array([1, 1, 2, 3, 5, 8])
>>> type(b)
<class 'numpy.ndarray'>
>>> type(a)
<class 'numpy.ndarray'>
>>> shm.name  # We did not specify a name so one was chosen for us
'psm_21467_46075'

>>> # In either the same shell or a new Python shell on the same machine
>>> import numpy as np
>>> from multiprocessing import shared_memory
>>> # Attach to the existing shared memory block
>>> existing_shm = shared_memory.SharedMemory(name='psm_21467_46075')
>>> # Note that a.shape is (6,) and a.dtype is np.int64 in this example
>>> c = np.ndarray((6,), dtype=np.int64, buffer=existing_shm.buf)
>>> c
array([1, 1, 2, 3, 5, 8])
>>> c[-1] = 888
>>> c
array([  1,   1,   2,   3,   5, 888])

>>> # Back in the first Python interactive shell, b reflects this change
>>> b
array([  1,   1,   2,   3,   5, 888])

>>> # Clean up from within the second Python shell
>>> del c  # Unnecessary; merely emphasizing the array is no longer used
>>> existing_shm.close()

>>> # Clean up from within the first Python shell
>>> del b  # Unnecessary; merely emphasizing the array is no longer used
>>> shm.close()
>>> shm.unlink()  # Free and release the shared memory block at the very end

For a real use case as yours, you would need to pass the name shm.name using a Pipe or any other multi-processing communication mechanism. Note that only this tiny string will need to be exchanged between processes; the actual data stays in the shared memory space.

这篇关于在没有继承的情况下在多个进程之间共享numpy数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆