使用共享数组在Python中实现快速FFT的内存对齐 [英] Memory alignment for fast FFT in Python using shared arrays

查看:135
本文介绍了使用共享数组在Python中实现快速FFT的内存对齐的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我编写了一个图像处理应用程序,它需要执行多项操作,并且必须尽可能实时地执行这些操作. 数据的获取及其处理在单独的过程中运行(主要是出于性能方面的考虑).数据本身非常大(2MPix 16位灰度图像).

I write an image processing app that needs to do multiple things and it has to do them as much real-time as possible. Acquisition of the data and their processing runs in separate processes (mainly for performance reasons). The data itself is quite large (2MPix 16-bit grayscale images).

如本文所述,我可以在进程之间共享数组: 如何通过python子进程之间的大型numpy数组没有保存到磁盘?(我使用numpy-shared包中的shmarray脚本). 我可以对这些数据执行提供的Numpy FFT,但是没有问题,但是速度很慢.

I can share arrays between processes as it is described in this post: How do I pass large numpy arrays between python subprocesses without saving to disk? (I use the shmarray script from the numpy-shared package). I can perform the supplied Numpy FFT on those data without problem, but it is quite slow.

调用FFTW可能会更快,但是为了充分利用它,我应该在与内存对齐的阵列上运行我的操作.

Calling FFTW would probably be much faster, but in order to fully benefit from it, I am supposed to run my operations on arrays that are memory aligned.

问题:是否有一种方法可以在进程之间创建和共享类似Numpy的数组,同时保证它们是内存对齐的?

The question: Is there a way how to create and share Numpy-like arrays between processes, that are, at the same time, guaranteed to be memory aligned?

推荐答案

获得正确对齐的内存的最简单的标准技巧是分配多于所需的内存,如果对齐错误,则跳过前几个字节.如果我没记错的话,NumPy数组将始终是8字节对齐的,并且FFTW需要16字节的匹配才能发挥最佳性能.因此,您只需分配比需要更多的8个字节,并在必要时跳过前8个字节.

The simplest standard trick to get correctly aligned memory is to allocate a bit more than needed and skip the first few bytes if the alignment is wrong. If I remember correctly, NumPy arrays will always be 8-byte aligned, and FFTW requires 16-byte aligment to perform best. So you would simply allocate 8 bytes more than needed, and skip the first 8 bytes if necessary.

编辑:这很容易实现.在NumPy数组的ctypes.data属性中,数据指针可以作为整数使用.使用移位的块可以通过切片,查看为不同的数据类型并重塑来实现-所有这些都不会复制数据,而是重用相同的buf.

Edit: This is rather easy to implement. The pointer to the data is available as an integer in the ctypes.data attribute of a NumPy array. Using the shifted block can be achieved by slicing, viewing as a different data type and reshaping -- all these won't copy the data, but rather reuse the same buf.

要分配一个16字节对齐的1000x1000的64位浮点数数组,可以使用以下代码:

To allocate an 16-byte aligned 1000x1000 array of 64-bit floating point numbers, we could use this code:

m = n = 1000
dtype = numpy.dtype(numpy.float64)
nbytes = m * n * dtype.itemsize
buf = numpy.empty(nbytes + 16, dtype=numpy.uint8)
start_index = -buf.ctypes.data % 16
a = buf[start_index:start_index + nbytes].view(dtype).reshape(m, n)

现在,a是具有所需属性的数组,可以通过检查a.ctypes.data % 16确实是0来验证.

Now, a is an array with the desired properties, as can be verified by checking that a.ctypes.data % 16 is indeed 0.

这篇关于使用共享数组在Python中实现快速FFT的内存对齐的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆