Keras / Tensorflow-Conv2d的傅立叶逐点乘法实现比空间卷积慢4倍 [英] Keras/Tensorflow - fourier pointwise multiplication implementation of conv2d running 4x slower than spatial convolution

查看:459
本文介绍了Keras / Tensorflow-Conv2d的傅立叶逐点乘法实现比空间卷积慢4倍的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据卷积定理,卷积在傅立叶域中变为点向乘法,并且在许多先前的工作中,由于将卷积运算转换为点向乘法运算而获得的增益已显示出进行傅里叶变换的开销被掩盖了类似于以下内容- https://arxiv.org/abs/1312.5851。

According to the convolution theorem, convolution changes to pointwise multiplication in the fourier domain, and the overheads of taking the fourier transform have been shown to be overshadowed by the gain due to conversion of convolution operation to pointwise multiplication operation in many previous works like the following - https://arxiv.org/abs/1312.5851.

为了复制这一点,我试图用一个接受输入数据rfft的自定义层替换keras.layers.Conv2D()层(我先将数据的rfft馈入到模型以减少训练时间),初始化与图像大小相同的 no_of_kernels个内核,取其rfft,将输入和内核按点相乘,然后返回乘积(是的,因为我想进一步训练网络,所以不用花费时间)在傅立叶域本身中)-

To replicate this, I was trying to replace the keras.layers.Conv2D() layer by a custom layer that accepts the rfft of input data (I took the rfft of data before feeding it into the model to reduce training time), initialises 'no_of_kernels' number of kernels of the same size as the image, takes its rfft, multiplies the input and kernel pointwise and returns the product (yes, without taking irfft since I want to further train the network in fourier domain itself) -

在该层中,调用函数被实现为以下形式ws-
注-在我的数据集中,即MNIST图像的高度=宽度,因此转置效果很好

In the layer, the call function is implemented as follows - Note - in my dataset, i.e. MNIST image height = width, so the transpose works fine

def call(self, x):
        fft_x = x #(batch_size, height, width, in_channels)
        fft_kernel = tf.spectral.rfft2d(self.kernel) #(in_channels, height, width, out_channels)
        fft_kernel = tf.transpose(fft_kernel, perm=[2, 1, 0, 3]) #(width, height, in_channels, out_channels)
        output  = tf.einsum('ijkl,jklo->ijko', fft_x, fft_kernel)
        return output 

此代码保留了Keras Conv2D给出的准确性层,但其运行速度比Conv2D慢4倍左右,因此无法实现转换为傅立叶域的目的。任何人都可以澄清为什么会发生这种情况,以及如何在傅立叶域中复制快速卷积的结果吗?

This code preserves accuracy given by Keras Conv2D layer, but it runs around 4 times slower than Conv2D, thus the purpose of transforming into fourier domain is defeated. Could anyone please clarify why this happens and how can I replicate the results of fast convolutions in the fourier domain?

(注意-对于可能会觉得tf.spectral.rfft2d的人(self.kernel)可能是开销,但事实并非如此。

(Note - For anyone who might feel tf.spectral.rfft2d(self.kernel) might be the overhead, it is not the case as I have verified.

此外,我认为Conv2D函数可能会使4D输入张量变平,并且内核将其简化为矩阵乘法,如下所述-在此处输入链接描述。除了想像tf.einsum那样将其视为点积之外,我想不出任何智能的展平方法来执行逐点乘法。是否有任何智能方法可以进行逐点乘法?)
谢谢。

Also, I think the Conv2D function might be flattening the 4D input tensors and kernels to reduce it to matrix multiplication as explained here - enter link description here. I could not think of any intelligent method of flattening etc. to perform pointwise multiplication except for viewing it as dot product as I have done with tf.einsum. Is there any intelligent method to do pointwise multiplication? ) Thanks.

编辑-
参考层的整个实现-

Edit - Entire implementation of the layer for reference-

class Fourier_Conv2D(Layer):
    def __init__(self, no_of_kernels, **kwargs):
        self.no_of_kernels = no_of_kernels
        super(Fourier_Conv2D, self).__init__(**kwargs)

    def build(self, input_shape):
        self.kernel_shape = (int(input_shape[3]), int(input_shape[1]), int(input_shape[2]), self.no_of_kernels)
        self.kernel = self.add_weight(name = 'kernel', 
                                      shape = self.kernel_shape, 
                                      initializer = 'uniform', trainable = True)
        super(Fourier_Conv2D, self).build(input_shape)

    def call(self, x):
        fft_x = x
        fft_kernel = tf.spectral.rfft2d(self.kernel)
        fft_kernel = tf.transpose(fft_kernel, perm=[2, 1, 0, 3])
        output  = tf.einsum('ijkl,jklo->ijko', fft_x, fft_kernel)
        return output       

    def compute_output_shape(self, input_shape):
        return (input_shape[0], input_shape[1], input_shape[2], int(self.no_of_kernels/2)+1)


推荐答案

我不认为您的结果令人惊讶,实现Keras中的Conv2D留给后端,大多数后端(如TensorFlow)具有非常优化的卷积操作版本,尤其是在使用CuDNN的情况下。因此,您自己的版本(比天真的实现要快)要比高度优化的版本要慢。

I don't think your result is surprising at all, the implementation of Conv2D in Keras is left to the backend, and most backends (like TensorFlow) have very optimized versions of the convolution operations, specially if you use CuDNN. So your own version, which should be faster than a naive implementation, is slower than a highly optimized one.

为了进行有意义的比较,您可能会必须实现一个天真的卷积基线Conv2D,而无需进行任何优化。

Its possible that in order to make a meaningful comparison, you will have to implement a baseline Conv2D that does convolution in a naive way, without any kind of optimizations.

这篇关于Keras / Tensorflow-Conv2d的傅立叶逐点乘法实现比空间卷积慢4倍的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆