如何计算卷积神经网络的参数数量? [英] How to calculate the number of parameters for convolutional neural network?

查看:563
本文介绍了如何计算卷积神经网络的参数数量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Lasagne为MNIST数据集创建CNN.我正在密切关注此示例:卷积神经网络和使用Python提取特征.

我目前拥有的CNN架构(其中不包括任何辍学层)是:

NeuralNet(
    layers=[('input', layers.InputLayer),        # Input Layer
            ('conv2d1', layers.Conv2DLayer),     # Convolutional Layer
            ('maxpool1', layers.MaxPool2DLayer), # 2D Max Pooling Layer
            ('conv2d2', layers.Conv2DLayer),     # Convolutional Layer
            ('maxpool2', layers.MaxPool2DLayer), # 2D Max Pooling Layer
            ('dense', layers.DenseLayer),        # Fully connected layer
            ('output', layers.DenseLayer),       # Output Layer
            ],
    # input layer
    input_shape=(None, 1, 28, 28),

    # layer conv2d1
    conv2d1_num_filters=32,
    conv2d1_filter_size=(5, 5),
    conv2d1_nonlinearity=lasagne.nonlinearities.rectify,

    # layer maxpool1
    maxpool1_pool_size=(2, 2),

    # layer conv2d2
    conv2d2_num_filters=32,
    conv2d2_filter_size=(3, 3),
    conv2d2_nonlinearity=lasagne.nonlinearities.rectify,

    # layer maxpool2
    maxpool2_pool_size=(2, 2),


    # Fully Connected Layer
    dense_num_units=256,
    dense_nonlinearity=lasagne.nonlinearities.rectify,

   # output Layer
    output_nonlinearity=lasagne.nonlinearities.softmax,
    output_num_units=10,

    # optimization method params
    update= momentum,
    update_learning_rate=0.01,
    update_momentum=0.9,
    max_epochs=10,
    verbose=1,
    )

这将输出以下图层信息:

  #  name      size
---  --------  --------
  0  input     1x28x28
  1  conv2d1   32x24x24
  2  maxpool1  32x12x12
  3  conv2d2   32x10x10
  4  maxpool2  32x5x5
  5  dense     256
  6  output    10

并输出可学习参数的数量为 217,706

我想知道这个数字是如何计算的?我已经阅读了许多资源,包括此StackOverflow的解决方案

首先让我们看看如何为您拥有的每种单独类型的图层计算可学习参数的数量,然后在示例中计算参数的数量. /p>

最后一个困难是第一个完全连接的层:由于该层是卷积层,因此我们不知道该层的输入的维数.要计算它,我们必须从输入图像的大小开始,并计算每个卷积层的大小.您的情况下,千层面已经为您计算了出来并报告了尺寸-这使我们很容易.如果必须自己计算每个图层的大小,则要复杂一些:

  • 在最简单的情况下(例如您的示例),卷积层的输出大小为input_size - (filter_size - 1),在您的情况下为:28-4 =24.这是由于卷积的性质造成的:我们使用例如5x5邻域来计算点-但是最外面的两个行和列没有5x5邻域,因此我们无法为这些点计算任何输出.这就是为什么我们的输出比输入小2 * 2 = 4行/列的原因.
  • 如果不希望输出的大小小于输入的大小,则可以对图像进行零填充(使用Lasagne中卷积层的pad参数).例如.如果在图像周围添加零的2行/列零,则输出大小将为(28 + 4)-4 = 28.因此,在填充的情况下,输出大小为input_size + 2*padding - (filter_size -1).
  • 如果您明确希望在卷积过程中对图像进行降采样,则可以定义一个步幅,例如stride=2,表示您以2像素为步长移动滤镜.然后,表达式变为((input_size + 2*padding - filter_size)/stride) +1.

在您的情况下,完整的计算为:

  #  name                           size                 parameters
---  --------  -------------------------    ------------------------
  0  input                       1x28x28                           0
  1  conv2d1   (28-(5-1))=24 -> 32x24x24    (5*5*1+1)*32   =     832
  2  maxpool1                   32x12x12                           0
  3  conv2d2   (12-(3-1))=10 -> 32x10x10    (3*3*32+1)*32  =   9'248
  4  maxpool2                     32x5x5                           0
  5  dense                           256    (32*5*5+1)*256 = 205'056
  6  output                           10    (256+1)*10     =   2'570

因此,在您的网络中,您总共拥有832 + 9'248 + 205'056 + 2'570 = 217'706可学习的参数,而这正是Lasagne报告的.

I'm using Lasagne to create a CNN for the MNIST dataset. I'm following closely to this example: Convolutional Neural Networks and Feature Extraction with Python.

The CNN architecture I have at the moment, which doesn't include any dropout layers, is:

NeuralNet(
    layers=[('input', layers.InputLayer),        # Input Layer
            ('conv2d1', layers.Conv2DLayer),     # Convolutional Layer
            ('maxpool1', layers.MaxPool2DLayer), # 2D Max Pooling Layer
            ('conv2d2', layers.Conv2DLayer),     # Convolutional Layer
            ('maxpool2', layers.MaxPool2DLayer), # 2D Max Pooling Layer
            ('dense', layers.DenseLayer),        # Fully connected layer
            ('output', layers.DenseLayer),       # Output Layer
            ],
    # input layer
    input_shape=(None, 1, 28, 28),

    # layer conv2d1
    conv2d1_num_filters=32,
    conv2d1_filter_size=(5, 5),
    conv2d1_nonlinearity=lasagne.nonlinearities.rectify,

    # layer maxpool1
    maxpool1_pool_size=(2, 2),

    # layer conv2d2
    conv2d2_num_filters=32,
    conv2d2_filter_size=(3, 3),
    conv2d2_nonlinearity=lasagne.nonlinearities.rectify,

    # layer maxpool2
    maxpool2_pool_size=(2, 2),


    # Fully Connected Layer
    dense_num_units=256,
    dense_nonlinearity=lasagne.nonlinearities.rectify,

   # output Layer
    output_nonlinearity=lasagne.nonlinearities.softmax,
    output_num_units=10,

    # optimization method params
    update= momentum,
    update_learning_rate=0.01,
    update_momentum=0.9,
    max_epochs=10,
    verbose=1,
    )

This outputs the following Layer Information:

  #  name      size
---  --------  --------
  0  input     1x28x28
  1  conv2d1   32x24x24
  2  maxpool1  32x12x12
  3  conv2d2   32x10x10
  4  maxpool2  32x5x5
  5  dense     256
  6  output    10

and outputs the number of learnable parameters as 217,706

I'm wondering how this number is calculated? I've read a number of resources, including this StackOverflow's question, but none clearly generalizes the calculation.

If possible, can the calculation of the learnable parameters per layer be generalised?

For example, convolutional layer: number of filters x filter width x filter height.

解决方案

Let's first look at how the number of learnable parameters is calculated for each individual type of layer you have, and then calculate the number of parameters in your example.

  • Input layer: All the input layer does is read the input image, so there are no parameters you could learn here.
  • Convolutional layers: Consider a convolutional layer which takes l feature maps at the input, and has k feature maps as output. The filter size is n x m. For example, this will look like this:

    Here, the input has l=32 feature maps as input, k=64 feature maps as output, and the filter size is n=3 x m=3. It is important to understand, that we don't simply have a 3x3 filter, but actually a 3x3x32 filter, as our input has 32 dimensions. And we learn 64 different 3x3x32 filters. Thus, the total number of weights is n*m*k*l. Then, there is also a bias term for each feature map, so we have a total number of parameters of (n*m*l+1)*k.

  • Pooling layers: The pooling layers e.g. do the following: "replace a 2x2 neighborhood by its maximum value". So there is no parameter you could learn in a pooling layer.
  • Fully-connected layers: In a fully-connected layer, all input units have a separate weight to each output unit. For n inputs and m outputs, the number of weights is n*m. Additionally, you have a bias for each output node, so you are at (n+1)*m parameters.
  • Output layer: The output layer is a normal fully-connected layer, so (n+1)*m parameters, where n is the number of inputs and m is the number of outputs.

The final difficulty is the first fully-connected layer: we do not know the dimensionality of the input to that layer, as it is a convolutional layer. To calculate it, we have to start with the size of the input image, and calculate the size of each convolutional layer. In your case, Lasagne already calculates this for you and reports the sizes - which makes it easy for us. If you have to calculate the size of each layer yourself, it's a bit more complicated:

  • In the simplest case (like your example), the size of the output of a convolutional layer is input_size - (filter_size - 1), in your case: 28 - 4 = 24. This is due to the nature of the convolution: we use e.g. a 5x5 neighborhood to calculate a point - but the two outermost rows and columns don't have a 5x5 neighborhood, so we can't calculate any output for those points. This is why our output is 2*2=4 rows/columns smaller than the input.
  • If one doesn't want the output to be smaller than the input, one can zero-pad the image (with the pad parameter of the convolutional layer in Lasagne). E.g. if you add 2 rows/cols of zeros around the image, the output size will be (28+4)-4=28. So in case of padding, the output size is input_size + 2*padding - (filter_size -1).
  • If you explicitly want to downsample your image during the convolution, you can define a stride, e.g. stride=2, which means that you move the filter in steps of 2 pixels. Then, the expression becomes ((input_size + 2*padding - filter_size)/stride) +1.

In your case, the full calculations are:

  #  name                           size                 parameters
---  --------  -------------------------    ------------------------
  0  input                       1x28x28                           0
  1  conv2d1   (28-(5-1))=24 -> 32x24x24    (5*5*1+1)*32   =     832
  2  maxpool1                   32x12x12                           0
  3  conv2d2   (12-(3-1))=10 -> 32x10x10    (3*3*32+1)*32  =   9'248
  4  maxpool2                     32x5x5                           0
  5  dense                           256    (32*5*5+1)*256 = 205'056
  6  output                           10    (256+1)*10     =   2'570

So in your network, you have a total of 832 + 9'248 + 205'056 + 2'570 = 217'706 learnable parameters, which is exactly what Lasagne reports.

这篇关于如何计算卷积神经网络的参数数量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆