共享内存中用于大型处理的大型numpy数组:这种方法有问题吗? [英] Large numpy arrays in shared memory for multiprocessing: Is something wrong with this approach?

查看:98
本文介绍了共享内存中用于大型处理的大型numpy数组:这种方法有问题吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

多处理是一个很棒的工具,但并不是直接使用大内存块的工具.您可以在每个进程中加载​​块并将结果转储到磁盘上,但是有时您需要将结果存储在内存中.最重要的是,使用精美的numpy功能.

Multiprocessing is a wonderful tool but is not so straight forward to use large memory chunks with it. You can load chunks in each process and dump results on disk but sometimes you need to store the results in the memory. And on top, use the fancy numpy functionality.

我已经阅读/搜索了很多内容,并提出了一些答案:

I have read/googled a lot and came up with some answers:

在共享内存中使用numpy数组进行多处理

在多处理过程之间共享大型只读的Numpy数组

Python多处理全局numpy数组

如何我如何在python子进程之间传递大型numpy数组而不保存到磁盘?

等等等

它们都有缺点:不太主流的库(sharedmem);全局存储变量;不太容易阅读代码,管道等.

They all have drawbacks: Not-so-mainstream libraries (sharedmem); globally storing variables; not so easy to read code, pipes, etc etc.

我的目标是在我的工作人员中无缝使用numpy,而不必担心转换和其他东西.

My goal was to seamlessly use numpy in my workers without worrying about conversions and stuff.

经过多次试验,我想到了.它可以在我的ubuntu 16,python 3.6、16GB,8核心计算机上使用.与以前的方法相比,我做了很多捷径".没有全局共享状态,没有需要在worker内部转换为numpy的纯内存指针,没有作为进程参数传递的大型numpy数组,等等.

After much trials I came up with this. And it works on my ubuntu 16, python 3.6, 16GB, 8 core machine. I did a lot of "shortcuts" compared to previous approaches. No global shared state, no pure memory pointers that need to be converted to numpy inside workers, large numpy arrays passed as process arguments, etc.

上面的垃圾箱"链接,但是我在这里只写了几段.

Pastebin link above, but I will put few snippets here.

某些进口:

import numpy as np
import multiprocessing as mp
import multiprocessing.sharedctypes
import ctypes

分配一些共享的内存,并将其包装到一个numpy数组中:

Allocate some shared mem and wrap it into an numpy array:

def create_np_shared_array(shape, dtype, ctype)
     . . . . 
    shared_mem_chunck = mp.sharedctypes.RawArray(ctype, size)
    numpy_array_view = np.frombuffer(shared_mem_chunck, dtype).reshape(shape)
    return numpy_array_view

创建共享数组并将其放入其中

Create shared array and put something in it

src = np.random.rand(*SHAPE).astype(np.float32)
src_shared = create_np_shared_array(SHAPE,np.float32,ctypes.c_float)
dst_shared = create_np_shared_array(SHAPE,np.float32,ctypes.c_float)
src_shared[:] = src[:]  # Some numpy ops accept an 'out' array where to store the results

产生该过程:

p = mp.Process(target=lengthly_operation,args=(src_shared, dst_shared, k, k + STEP))
p.start()
p.join()

以下是一些结果(有关完整参考,请参见pastebin代码):

Here are some results (see pastebin code for full reference):

Serial version: allocate mem 2.3741257190704346 exec: 17.092209577560425 total: 19.46633529663086 Succes: True
Parallel with trivial np: allocate mem 2.4535582065582275 spawn  process: 0.00015354156494140625 exec: 3.4581971168518066 total: 5.911908864974976 Succes: False
Parallel with shared mem np: allocate mem 4.535916328430176 (pure alloc:4.014216661453247 copy: 0.5216996669769287) spawn process: 0.00015664100646972656 exec: 3.6783478260040283 total: 8.214420795440674 Succes: True

我也做了一个cProfile(为什么分配共享内存时要多花2秒钟?),并且意识到对tempfile.py{method 'write' of '_io.BufferedWriter' objects}有一些调用.

I also did a cProfile (why 2 extra seconds when allocating shared mem?) and realized that there are some calls to the tempfile.py, {method 'write' of '_io.BufferedWriter' objects}.

问题

  • 我做错什么了吗?
  • (大型)阵列是否来回腌制,而我没有获得任何可加快速度的东西?请注意,第二次运行(使用常规np阵列无法通过正确性测试)
  • 是否有办法进一步改善时序,代码清晰度等? (写成多处理范例)

注释

  • 我无法使用进程池,因为mem必须在派生时继承,并且不能作为参数发送.

推荐答案

共享数组的分配很慢,因为显然是先将其写入磁盘,因此可以通过mmap进行共享.作为参考,请参见 heap.py sharedctypes.py . 这就是为什么tempfile.py显示在事件探查器中的原因.我认为这种方法的优点是可以在发生崩溃的情况下清理共享内存,而POSIX共享内存无法保证这一点.

Allocation of the shared array is slow, because apparently it's written to disk first, so it can be shared through a mmap. For reference see heap.py and sharedctypes.py. This is why tempfile.py shows up in the profiler. I think the advantage of this approach is that the shared memory is cleaned up in case of a crash, and this cannot be guaranteed with POSIX shared memory.

由于使用了fork,因此代码没有发生任何酸洗,而且正如您所说的,内存是继承的.第二次运行不起作用的原因是,不允许子进程写入父进程的内存中.相反,私有页面是动态分配的,只有在子进程结束时才会被公开.

There is no pickling happening with your code, thanks to fork and, as you said, the memory is inherited. The reason the 2nd run doesn't work is because the child processes are not allowed to write in the memory of the parent. Instead, private pages are allocated on the fly, only to be discared when the child process ends.

我只有一个建议:您不必自己指定ctype,可以通过np.ctypeslib._typecodes从numpy dtype中找出正确的类型.或者只是对所有内容使用c_byte并使用dtype itemssize来确定缓冲区的大小,无论如何它将由numpy强制转换.

I only have one suggestion: You don't have to specify a ctype yourself, the right type can be figured out from the numpy dtype through np.ctypeslib._typecodes. Or just use c_byte for everything and use the dtype itemsize to figure out the size of the buffer, it will be casted by numpy anyway.

这篇关于共享内存中用于大型处理的大型numpy数组:这种方法有问题吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆