Python numpy.fft的变化幅度很大 [英] Python numpy.fft changes strides

查看:229
本文介绍了Python numpy.fft的变化幅度很大的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尊敬的stackoverflow社区!

Dear stackoverflow community!

今天,我发现在高端集群体系结构上,两个1921 x 512 x 512尺寸的多维数据集的元素相乘需要〜27 s.这太长了,因为在当前实现中,我必须至少执行256次这样的计算才能对功率谱进行方位平均.我发现性能下降主要是由于步幅结构不同(一种情况下为C,另一种情况下为FORTRAN).这两个数组之一是新生成的布尔网格(C顺序),另一个数组(FORTRAN顺序)来自3D ndarray.copy() FT网格),大约可以达到4s,这是一个巨大的进步.

Today I found that on a high-end cluster architecture, an elementwise multiplication of 2 cubes with dimensions 1921 x 512 x 512 takes ~ 27 s. This is much too long since I have to perform such computations at least 256 times for an azimuthal averaging of a power spectrum in the current implementation. I found that the slow performance was mainly due to different stride structures (C in one case and FORTRAN in the other). One of the two arrays was a newly generated boolean grid (C order) and the other one (FORTRAN order) came from the 3D numpy.fft.fftn() Fourier transform of an input grid (C order). Any reasons why numpy.fft.fftn() changes the strides and ideas on how to prevent that except for reversing the axes (which would be just a workaround)? With similar strides (ndarray.copy() of the FT grid), ~ 4s are achievable, a tremendous improvement.

因此,问题如下:

考虑数组:

ran = np.random.rand(1921, 512, 512)
ran.strides
(2097152, 4096, 8)

a = np.fft.fftn(ran)
a.strides
(16, 30736, 15736832)

我们可以看到步幅结构是不同的.如何避免这种情况(不使用a = np.fft.fftn(ran,axes =(1,0)))?是否还有其他可能影响步幅结构的numpy数组例程?在这些情况下该怎么办?

As we can see the stride structure is different. How can this be prevented (without using a = np.fft.fftn(ran, axes = (1,0)))? Are there any other numpy array routines that could affect stride structure? What can be done in those cases?

一如既往地非常有用的建议!

Helpful advice is as usual much appreciated!

推荐答案

除了numpy.fft.fftn()改变轴的步幅和思路外,是否有任何原因,除了反转轴(这只是一种解决方法)之外?

Any reasons why numpy.fft.fftn() changes the strides and ideas on how to prevent that except for reversing the axes (which would be just a workaround)?

计算数组的多维DFT包括在每个维度上连续计算1D DTF.有两种策略:

Computing the multidimensionnal DFT of an array consists in successively computing 1D DTFs over each dimensions. There are two strategies:

  1. 将1D DTF计算限制为连续的1D数组.由于阵列是连续的,因此可以减少与延迟/缓存未命中有关的问题.这种策略的主要缺点是:数组要在每个维度之间进行转置. numpy.fft可能采用了该策略.在计算结束时,已对数组进行了转置.为避免不必要的计算,返回了转置的数组并修改了步幅.
  2. 为跨步数组启用一维DDFT计算.这可能会引发一些与延迟有关的问题.这是fftw的策略,可通过接口pyfftw使用.结果,输出数组的步幅与输入数组的步幅相同.
  1. Restrict 1D DTF computations to contiguous 1D arrays. As the array is contiguous, problem related to latency/cache misses will be reduced. This strategy has a major drawback: the array is to be transposed between each dimension. It is likely the strategy adopted by numpy.fft. At the end of computations, the array has been transposed. To avoid unnecessary computations, the transposed array is returned and strides are modified.
  2. Enable 1D DDFT computations for strided arrays. This might trigger some problem related to latency. It is the strategy of fftw, avaible through the interface pyfftw. As a result, the output array features the same strides as the input array.

numpy.fftnpyfftw.numpy.fftn进行计时

Timing numpy.fftn and pyfftw.numpy.fftn as performed here and there or there will tell you whether FFTW is really the Fastest Fourier Transform in the West or not...

  • To check that numpy uses the first strategy, take a look at numpy/fft/fftpack.py. At line 81-85, the call to work_function(a, wsave) (i.e. fftpack.cfftf, from FFTPACK, arguments documented there) is enclosed between calls to numpy.swapaxes() performing the transpositions.

scipy.fftpack.fftn似乎并没有改变步伐...不过,似乎它利用了第一个策略. scipy.fftpack.fftn() 调用 scipy.fftpack.zfftnd() ,它调用 进行换位.

scipy.fftpack.fftn does not seem to change the strides... Nevertheless, it seems that it makes use of the first strategy. scipy.fftpack.fftn() calls scipy.fftpack.zfftnd() which calls zfft(), based on zfftf1 which does not seem to handle strided DFTs. Moreover, zfftnd() calls many times the function flatten() which performs the transposition.

另一个示例:对于并行分布式内存多维DFT, FFTW-MPI使用第一种策略来避免一维DTF期间进程之间的任何MPI通信.当然,用于转置数组的函数距离不远,该过程涉及很多MPI通信.

Another example: for parallel distributed memory multidimensionnal DFTs, FFTW-MPI uses the first strategy to avoid any MPI communications between processes during 1D DTFs. Of course, functions to transpose the array are not far away and a lot a MPI communications are involved in the process.

是否还有其他可能影响步幅结构的numpy数组例程?在这些情况下该怎么办?

Are there any other numpy array routines that could affect stride structure? What can be done in those cases?

您可以搜索swapaxes 的numpy的github存储库:此功能仅使用了几次.因此,在我看来,这种大步变化"是fft.fftn()特有的,大多数numpy函数都保持大步不变.

You can search the github repository of numpy for swapaxes: this funtion is only used a couple of times. Hence, to my mind, this "change of strides" is particular to fft.fftn() and most numpy functions keep the strides unchanged.

最后,大步改变"是第一个策略的一个特征,没有办法阻止它.唯一的解决方法是在计算结束时交换轴,这很昂贵.但是您可以依赖pyfftw,因为fftw以非常有效的方式实现了第二个策略.如果不同阵列的步幅一致,则DFT计算将更快,随后的计算也将更快.

Finally, the "change of strides" is a feature of the first strategy and there is no way to prevent that. The only workaround is to swap the axes at the end of the computation, which is costly. But you can rely on pyfftw since fftw implements the second strategy in a very efficient way. The DFT computations will be faster, and subsequent computations will be faster as well if the strides of the different arrays become consistent.

这篇关于Python numpy.fft的变化幅度很大的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆