两个三维数组的卷积,一侧填充太慢 [英] Convolution of two three dimensional arrays with padding on one side too slow

查看:120
本文介绍了两个三维数组的卷积,一侧填充太慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我当前的项目中,我需要以一种稍微不同寻常的方式来"卷积"两个三维阵列:

In my current project I need to "convolve" two three dimensional arrays in a slightly unusual way:

假设我们有两个三维数组A和B,其维度为dimA和dimB(每个轴都相同).现在,我们要为每个轴创建尺寸为dimA + dimB的第三个数组C.

Assume we have two three dimensional arrays A and B with the dimensions dimA and dimB (same for every axis). Now we want to create a third array C with the dimensions dimA+dimB for every axis.

C的项计算为:

c_{x1+x2,y1+y2,z1+z2} += a_{x1,y1,z1} * b_{x2,y2,z2}

我当前的版本很简单:

dimA = A.shape[0]
dimB = B.shape[0]
dimC = dimA+dimB

C = np.zeros((dimC,dimC,dimC))
for x1 in range(dimA):
    for x2 in range(dimB):
        for y1 in range(dimA):
            for y2 in range(dimB):
                for z1 in range(dimA):
                    for z2 in range(dimB):
                        x = x1+x2
                        y = y1+y2
                        z = z1+z2
                        C[x,y,z] += A[x1,y1,z1] * B[x2,y2,z2] 

不幸的是,此版本确实很慢并且不可用.

Unfortunately, this version is really slow and not usable.

我的第二个版本是:

C = scipy.signal.fftconvolve(A,B,mode="full")

但是这仅计算元素max(dimA,dimB)

有人有更好的主意吗?

推荐答案

一般规则是对作业使用正确的算法,除非卷积核比数据短,否则它是基于FFT的卷积(短大致表示小于log2(n),其中n是数据的长度.

The general rule is to use the correct algorithm for the job, which, unless the convolution kernel is short compared to the data, is an FFT based convolution (short roughly means less than log2(n) where n is the length of the data).

在您的情况下,由于要对两个相等大小的数据集进行卷积,因此您可能需要考虑基于FFT的卷积.

In your case, since you're convolving two equal sizes datasets, you probably want to be considering an FFT based convolution.

很明显,scipy.signal.fftconvolve在这方面是不足的.使用更快的FFT算法,通过滚动自己的卷积例程可以做得更好(fftconvolve不能将变换大小强制为2的幂,否则可能会被猴子修补).

Clearly, scipy.signal.fftconvolve is being a touch deficient in this regard. Using a faster FFT algorithm, you can do much better by rolling your own convolution routine (it doesn't help that fftconvolve forces the transform size to powers of two, otherwise it could be monkey patched).

以下代码使用 pyfftw ,这是我对 FFTW 并创建一个自定义卷积类CustomFFTConvolution:

The following code uses pyfftw, my wrappers around FFTW and creates a custom convolution class, CustomFFTConvolution:

class CustomFFTConvolution(object):

    def __init__(self, A, B, threads=1):

        shape = (np.array(A.shape) + np.array(B.shape))-1

        if np.iscomplexobj(A) and np.iscomplexobj(B):
            self.fft_A_obj = pyfftw.builders.fftn(
                    A, s=shape, threads=threads)
            self.fft_B_obj = pyfftw.builders.fftn(
                    B, s=shape, threads=threads)
            self.ifft_obj = pyfftw.builders.ifftn(
                    self.fft_A_obj.get_output_array(), s=shape,
                    threads=threads)

        else:
            self.fft_A_obj = pyfftw.builders.rfftn(
                    A, s=shape, threads=threads)
            self.fft_B_obj = pyfftw.builders.rfftn(
                    B, s=shape, threads=threads)
            self.ifft_obj = pyfftw.builders.irfftn(
                    self.fft_A_obj.get_output_array(), s=shape,
                    threads=threads)

    def __call__(self, A, B):

        fft_padded_A = self.fft_A_obj(A)
        fft_padded_B = self.fft_B_obj(B)

        return self.ifft_obj(fft_padded_A * fft_padded_B)

这用作:

custom_fft_conv = CustomFFTConvolution(A, B)
C = custom_fft_conv(A, B) # This can contain different values to during construction

构造类时,

带有可选的threads参数.创建类的目的是受益于FFTW预先计划转换的功能.

with an optional threads argument when constructing the class. The purpose of creating a class is to benefit from the ability of FFTW to plan the transform in advance.

下面的完整演示代码只是扩展了 @Kelsey的答案,以用于计时等等.

The full demo code below simply extends @Kelsey's answer for the timing and so on.

在numba解决方案和vanilla fftconvolve解决方案上,提速都是相当可观的.对于n = 33,它的速度大约是两者的40-45倍.

The speedup is substantial over both the numba solution and the vanilla fftconvolve solution. For n = 33, it's about 40-45x faster than both.

from timeit import Timer
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import fftconvolve
from numba import jit, double
import pyfftw

# Original code
def custom_convolution(A, B):

    dimA = A.shape[0]
    dimB = B.shape[0]
    dimC = dimA + dimB

    C = np.zeros((dimC, dimC, dimC))
    for x1 in range(dimA):
        for x2 in range(dimB):
            for y1 in range(dimA):
                for y2 in range(dimB):
                    for z1 in range(dimA):
                        for z2 in range(dimB):
                            x = x1 + x2
                            y = y1 + y2
                            z = z1 + z2
                            C[x, y, z] += A[x1, y1, z1] * B[x2, y2, z2]
    return C

# Numba'ing the function with the JIT compiler
numba_convolution = jit(double[:, :, :](double[:, :, :],
                        double[:, :, :]))(custom_convolution)

def fft_convolution(A, B):
    return fftconvolve(A, B, mode='full')

class CustomFFTConvolution(object):

    def __init__(self, A, B, threads=1):

        shape = (np.array(A.shape) + np.array(B.shape))-1

        if np.iscomplexobj(A) and np.iscomplexobj(B):
            self.fft_A_obj = pyfftw.builders.fftn(
                    A, s=shape, threads=threads)
            self.fft_B_obj = pyfftw.builders.fftn(
                    B, s=shape, threads=threads)
            self.ifft_obj = pyfftw.builders.ifftn(
                    self.fft_A_obj.get_output_array(), s=shape,
                    threads=threads)

        else:
            self.fft_A_obj = pyfftw.builders.rfftn(
                    A, s=shape, threads=threads)
            self.fft_B_obj = pyfftw.builders.rfftn(
                    B, s=shape, threads=threads)
            self.ifft_obj = pyfftw.builders.irfftn(
                    self.fft_A_obj.get_output_array(), s=shape,
                    threads=threads)

    def __call__(self, A, B):

        fft_padded_A = self.fft_A_obj(A)
        fft_padded_B = self.fft_B_obj(B)

        return self.ifft_obj(fft_padded_A * fft_padded_B)

def run_test():
    reps = 10
    nt, ft, cft, cft2 = [], [], [], []
    x = range(2, 34)

    for N in x:
        print N
        A = np.random.rand(N, N, N)
        B = np.random.rand(N, N, N)

        custom_fft_conv = CustomFFTConvolution(A, B)
        custom_fft_conv_nthreads = CustomFFTConvolution(A, B, threads=2)

        C1 = numba_convolution(A, B)
        C2 = fft_convolution(A, B)
        C3 = custom_fft_conv(A, B)
        C4 = custom_fft_conv_nthreads(A, B)

        assert np.allclose(C1[:-1, :-1, :-1], C2)
        assert np.allclose(C1[:-1, :-1, :-1], C3)
        assert np.allclose(C1[:-1, :-1, :-1], C4)

        t = Timer(lambda: numba_convolution(A, B))
        nt.append(t.timeit(number=reps))
        t = Timer(lambda: fft_convolution(A, B))
        ft.append(t.timeit(number=reps))
        t = Timer(lambda: custom_fft_conv(A, B))
        cft.append(t.timeit(number=reps))
        t = Timer(lambda: custom_fft_conv_nthreads(A, B))
        cft2.append(t.timeit(number=reps))

    plt.plot(x, ft, label='scipy.signal.fftconvolve')
    plt.plot(x, nt, label='custom numba convolve')
    plt.plot(x, cft, label='custom pyfftw convolve')
    plt.plot(x, cft2, label='custom pyfftw convolve with threading')        
    plt.legend()
    plt.show()

if __name__ == '__main__':
    run_test()

编辑:最近的scipy做得更好,但并非总是如此填充到2个长度的幂,因此在输出上更接近pyFFTW情况.

More recent scipy does a better job of not always padding to powers of 2 length so is closer in output to the pyFFTW case.

这篇关于两个三维数组的卷积,一侧填充太慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆