Python快速实现3D数组的卷积/互相关 [英] Python Fast Implementation of Convolution/Cross-correlation of 3D arrays

查看:218
本文介绍了Python快速实现3D数组的卷积/互相关的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究计算3D图像的卷积(互相关).由于问题的性质,不需要基于FFT的卷积近似值(例如,scipy fftconvolve),并且直接和"是可行的方法.图像的大小约为〜(150,150,150),最大的内核大小约为〜(40,40,40).图像是周期性的(具有周期性边界条件,或需要用同一图像填充),因为必须为一次分析进行约100次此类卷积,所以卷积函数的速度至关重要.

I'm working on calculating convolutions (cross-correlation) of 3D images. Due to the nature of the problem, FFT based approximations of convolution (e.g. scipy fftconvolve) is not desired, and the "direct sum" is the way to go. The images are ~(150, 150, 150) in size, and the largest kernels are ~(40, 40, 40) in size. the images are periodic (has periodic boundary condition, or needs to be padded by the same image) since ~100 such convolutions has to be done for one analysis, the speed of the convolution function is critical.

我已经实现并测试了几个功能,包括使用"method = direct"进行卷积的科学实现,结果如下所示.我使用了(100,100,100)图像和(7,7,7)内核来对这里的方法进行基准测试.

I have implemented and tested several functions, including the scipy implementation of convolve with "method = direct", and the results are shown below. I used a (100, 100, 100) image and a (7, 7, 7) kernel to benchmark the methods here:

import numpy as np
import time
from scipy import signal
image = np.random.rand(Nx,Ny,Nz)
kernel = np.random.rand(3,5,7)

signal.convolve(image,kernel, mode='same',method = "direct")

拍摄:8.198秒

然后我基于数组加法编写了自己的函数

I then wrote my own function based on array addition

def shift_array(array, a,b,c):
    A = np.roll(array,a,axis = 0)
    B = np.roll(A,b,axis = 1)
    C = np.roll(B,c,axis = 2)
    return C

def matrix_convolve2(image,kernel, mode = "periodic"):
    if mode not in ["periodic"]:
        raise NotImplemented
    if mode is "periodic":
        Nx, Ny, Nz = image.shape
        nx, ny, nz = kernel.shape
        rx = nx//2
        ry = ny//2
        rz = nz//2
        result = np.zeros((Nx, Ny, Nz))
        for i in range(nx):
            for j in range(ny):
                for k in range(nz):
                    result += kernel[i,j,k] * shift_array(image, rx-i, ry-j, rz-k) 
        return result


matrix_convolve2(image,kernel)

拍摄:6.324秒

在这种情况下,似乎限制因素是周期性边界条件的np.roll函数,因此我尝试通过整理输入图像来规避此问题

It seems in this case the limiting factor here is the np.roll function for periodic boundary condition, so I tried to circumvented this by tilling up the input image

def matrix_convolve_center(image,kernel):
    # Only get convolve result for the "central" block
    nx, ny, nz = kernel.shape
    rx = nx//2
    ry = ny//2
    rz = nz//2
    result = np.zeros((Nx, Ny, Nz))
    for i in range(nx):
        for j in range(ny):
            for k in range(nz):
                result += kernel[i,j,k] * image[Nx+i-rx:2*Nx+i-rx,Ny+j-ry:2*Ny+j-ry,Nz+k-rz:2*Nz+k-rz]
    return result

def matrix_convolve3(image,kernel):

    Nx, Ny, Nz = image.shape
    nx, ny, nz = kernel.shape

    extended_image = np.tile(image,(3,3,3))
    result = matrix_convolve_center(extended_image,kernel,Nx, Ny, Nz)
    return result

matrix_convolve3(image,kernel)

拍摄:2.639秒

到目前为止,这种方法可提供最佳性能,但对于实际应用而言仍然太慢.

This approach gives the best performance so far, but still way too slow for actual application.

我做了一些研究,似乎使用"Numba"可以显着提高性能,或者也许以并行方式编写相同的功能也可以有所帮助,但是我不喜欢Numba,也不是python并行化(我曾经multiprocess库的一些不好的经验……似乎跳过了迭代或有时突然停止了)

I did some research, and it seems using "Numba" could significantly improve the performance, or maybe write the same function in a parallel way could help too, but I'm not farmiliar with Numba, nor python parallelization (I had some bad experience with the multiprocess library... it seemed to skip iterations or suddenly stop sometimes)

你们能在这里帮助我吗?任何改进将不胜感激.非常感谢!

Could you guys help me here? Any improvement would be greatly appreciated. Thanks a lot!

推荐答案

这还没有定论,但是对于我检查的示例,fft确实比幼稚的(顺序)求和更为准确.因此,除非您有充分的理由相信您的数据有所不同,否则我的建议是:避免麻烦并使用fft.

This is far from conclusive but for the examples I checked fft is indeed more accurate than naive (sequential) summation. So, unless you have good reason to believe that your data are somehow different, my recommendation would be: Save yourself the trouble and use fft.

更新:添加了我自己的直接方法,请确保使用成对求和.设法比fft准确一些,但仍然很慢.

UPDATE: Added my own direct method, taking care to ensure it uses pairwise summation. This manages to be a bit more accurate than fft, but is still very slow.

测试脚本:

import numpy as np
from scipy import stats, signal, fftpack

def matrix_convolve_center(image,kernel,Nx,Ny,Nz):
    # Only get convolve result for the "central" block
    nx, ny, nz = kernel.shape
    rx = nx//2
    ry = ny//2
    rz = nz//2
    result = np.zeros((Nx, Ny, Nz))
    for i in range(nx):
        for j in range(ny):
            for k in range(nz):
                result += kernel[i,j,k] * image[Nx+i-rx:2*Nx+i-rx,Ny+j-ry:2*Ny+j-ry,Nz+k-rz:2*Nz+k-rz]
    return result

def matrix_convolve3(image,kernel):

    Nx, Ny, Nz = image.shape
    nx, ny, nz = kernel.shape

    extended_image = np.tile(image,(3,3,3))
    result = matrix_convolve_center(extended_image,kernel,Nx, Ny, Nz)
    return result

P=0   # parity
CH=10 # chunk size

# make integer example, so exact soln is readily available
image = np.random.randint(0,100,(8*CH+P,8*CH+P,8*CH+P))
kernel = np.random.randint(0,100,(2*CH+P,2*CH+P,2*CH+P))
kerpad = np.zeros_like(image)
kerpad[3*CH:-3*CH,3*CH:-3*CH,3*CH:-3*CH]=kernel[::-1,::-1,::-1]
cexa = np.round(fftpack.fftshift(fftpack.ifftn(fftpack.fftn(fftpack.ifftshift(image))*fftpack.fftn(fftpack.ifftshift(kerpad)))).real).astype(int)
# sanity check
assert cexa.sum() == kernel.sum() * image.sum()

# normalize to preclude integer arithmetic during the actual test
image = image / image.sum()
kernel = kernel / kernel.sum()
cexa = cexa / cexa.sum()

# fft method
kerpad = np.zeros_like(image)
kerpad[3*CH:-3*CH,3*CH:-3*CH,3*CH:-3*CH]=kernel[::-1,::-1,::-1]
cfft = fftpack.fftshift(fftpack.ifftn(fftpack.fftn(fftpack.ifftshift(image))*fftpack.fftn(fftpack.ifftshift(kerpad))))

def direct_pp(image,kernel):
    nx,ny,nz = image.shape
    kx,ky,kz = kernel.shape
    out = np.zeros_like(image)
    image = np.concatenate([image[...,-kz//2+1:],image,image[...,:kz//2+P]],axis=2)
    image = np.concatenate([image[:,-ky//2+1:],image,image[:,:ky//2+P]],axis=1)
    image = np.concatenate([image[-kx//2+1:],image,image[:kx//2+P]],axis=0)
    mx,my,mz = image.shape
    ox,oy,oz = 2*mx-nx,2*my-ny,2*mz-nz
    aux = np.empty((ox,oy,kx,ky),image.dtype)
    s0,s1,s2,s3 = aux.strides
    aux2 = np.lib.stride_tricks.as_strided(aux[kx-1:,ky-1:],(mx,my,kx,ky),(s0,s1,s2-s0,s3-s1))
    for z in range(nz):
        aux2[...] = np.einsum('ijm,klm',image[...,z:z+kz],kernel)
        out[...,z] = aux[kx-1:kx-1+nx,ky-1:ky-1+ny].sum((2,3))
    return out

# direct methods
print("How about a coffee? (This may take some time...)")

from time import perf_counter as pc

T = []
T.append(pc())
cdirpp = direct_pp(image,kernel)
T.append(pc())
cdir = np.roll(matrix_convolve3(image,kernel),P-1,(0,1,2))
T.append(pc())
# compare squared error
nrm = (cexa**2).sum()
print('accuracy')
print('fft   ',((cexa-cfft)*(cexa-cfft.conj())).real.sum()/nrm)
print('direct',((cexa-cdir)**2).sum()/nrm)
print('dir pp',((cexa-cdirpp)**2).sum()/nrm)
print('duration direct methods')
print('pp {} OP {}'.format(*np.diff(T)))

样品运行:

How about a coffee? (This may take some time...)
accuracy
fft    5.690597572945596e-32
direct 8.518853759493871e-30
dir pp 1.3317651721034386e-32
duration direct methods
pp 5.817311848048121 OP 20.05021938495338

这篇关于Python快速实现3D数组的卷积/互相关的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆