在卷积网络中使用im2col运算如何更有效率? [英] How is using im2col operation in convolutional nets more efficient?

查看:226
本文介绍了在卷积网络中使用im2col运算如何更有效率?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试实现卷积神经网络,但我不明白为什么使用im2col操作会更高效.它基本上将要乘以过滤器的输入存储在单独的列中.但是,为什么不应该直接使用循环来计算卷积而不是先执行im2col呢?

I am trying to implement a convolutional neural netwrok and I don't understand why using im2col operation is more efficient. It basically stores the input to be multiplied by filter in separate columns. But why shouldn't loops be used directly to calculate convolution instead of first performing im2col ?

推荐答案

  1. 好吧,您以正确的方式进行思考,在Alex Net中,将近95%的GPU时间和89%的CPU时间用于卷积层和完全连接层.

  1. Well, you are thinking in the right way, In Alex Net almost 95% of the GPU time and 89% on CPU time is spent on the Convolutional Layer and Fully Connected Layer.

卷积层和完全连接层是使用GEMM表示的,GEMM代表通用矩阵到矩阵乘法.

The Convolutional Layer and Fully Connected Layer are implemented using GEMM that stands for General Matrix to Matrix Multiplication.

因此,基本上在GEMM中,我们使用称为im2col()的函数将卷积运算转换为矩阵乘法运算,该函数以可以通过矩阵乘法实现卷积输出的方式排列数据.

So basically in GEMM, we convert the convolution operation to a Matrix Multiplication operation by using a function called im2col() which arranges the data in a way that the convolution output can be achieved by Matrix Multiplication.

现在,您可能有一个问题,而不是直接进行元素级卷积,为什么我们要在两者之间添加一个步骤以不同的方式排列数据,然后使用GEMM.

Now, you may have a question instead of directly doing element wise convolution, why are we adding a step in between to arrange the data in a different way and then use GEMM.

对此的答案是,科学程序员花了数十年的时间优化代码以执行大型矩阵到矩阵的乘法,而非常规则的内存访问模式所带来的好处却超过了其他损失. 我们在cuBLAS库中提供了优化的CUDA GEMM API,英特尔MKL提供了优化的CPU GEMM,而ciBLAS的GEMM API可用于支持OpenCL的设备.

The answer to this is, scientific programmers, have spent decades optimizing code to perform large matrix to matrix multiplications, and the benefits from the very regular patterns of memory access outweigh any other losses. We have an optimized CUDA GEMM API in cuBLAS library, Intel MKL has an optimized CPU GEMM while ciBLAS's GEMM API can be used for devices supporting OpenCL.

基于元素的卷积性能不佳,因为其中涉及不规则的内存访问.

Element wise convolution performs badly because of the irregular memory accesses involved in it.

依次,Im2col()以矩阵访问规则进行内存访问的方式排列数据.

In turn, Im2col() arranges the data in a way that the memory accesses are regular for Matrix Multiplication.

Im2col()函数虽然增加了很多数据冗余,但是使用Gemm的性能优势超过了此数据冗余.

Im2col() function adds a lot of data redundancy though, but the performance benefit of using Gemm outweigh this data redundancy.

这是在神经网络中使用Im2col()操作的原因.

This is the reason for using Im2col() operation in Neural Nets.

此链接说明Im2col()如何安排GEMM的数据: https://petewarden. com/2015/04/20/why-gemm-at-the-the-heart-of-deep-learning/

This link explains how Im2col() arranges the data for GEMM: https://petewarden.com/2015/04/20/why-gemm-is-at-the-heart-of-deep-learning/

这篇关于在卷积网络中使用im2col运算如何更有效率?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆