卷积神经网络不收敛 [英] Convolutional neural network not converging

查看:175
本文介绍了卷积神经网络不收敛的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在观看有关深度学习/卷积神经网络的视频,例如此处此处,我试图用C ++实现自己的实现.我尝试使输入数据相对简单,因此我的想法是区分十字和圆形,我有一个小数据集,每个约25个(64 * 64图像),它们看起来像这样:

I've been watching some videos on deep learning/convolutional neural networks, like here and here, and I tried to implement my own in C++. I tried to keep the input data fairly simple for my first attempt so the idea is to differentiate between a cross and a circle, I have a small data set of around 25 of each (64*64 images), they look like this:

网络本身分为五层:

Convolution (5 filters, size 3, stride 1, with a ReLU)
MaxPool (size 2) 
Convolution (1 filter, size 3, stride 1, with a ReLU)
MaxPool (size 2)
Linear Regression classifier

我的问题是我的网络在任何方面都无法融合.权重似乎都没有变化.如果我运行它,除了偶尔的离群值在下一次迭代返回之前会跳升之外,其他的预测几乎都保持不变.

My issue is that my network isn't converging, on anything. None of the weights appear to change. If I run it the predictions mostly stay the same other than the occasional outlier which jumps up before returning on the next iteration.

卷积层训练看起来像这样,删除了一些循环以使其更干净

The convolutional layer training looks something like this, removed some loops to make it cleaner

// Yeah, I know I should change the shared_ptr<float>
void ConvolutionalNetwork::Train(std::shared_ptr<float> input,std::shared_ptr<float> outputGradients, float label)
{
    float biasGradient = 0.0f;

    // Calculate the deltas with respect to the input.
    for (int layer = 0; layer < m_Filters.size(); ++layer)
    {
        // Pseudo-code, each loop on it's own line in actual code
        For z < depth, x <width - filterSize, y < height -filterSize
        {               
            int newImageIndex = layer*m_OutputWidth*m_OutputHeight+y*m_OutputWidth + x;

            For the bounds of the filter (U,V)
            {
                // Find the index in the input image
                int imageIndex = x + (y+v)*m_OutputWidth + z*m_OutputHeight*m_OutputWidth;
                int kernelIndex = u +v*m_FilterSize + z*m_FilterSize*m_FilterSize;
                m_pGradients.get()[imageIndex] += outputGradients.get()[newImageIndex]*input.get()[imageIndex];
                m_GradientSum[layer].get()[kernelIndex] += m_pGradients.get()[imageIndex] * m_Filters[layer].get()[kernelIndex];

                biasGradient += m_GradientSum[layer].get()[kernelIndex];
            }       
        }
    }

    // Update the weights
    for (int layer = 0; layer < m_Filters.size(); ++layer)
    {
        For z < depth, U & V < filtersize
        {
            // Find the index in the input image
            int kernelIndex = u +v*m_FilterSize + z*m_FilterSize*m_FilterSize;
            m_Filters[layer].get()[kernelIndex] -= learningRate*m_GradientSum[layer].get()[kernelIndex];
        }
        m_pBiases.get()[layer] -= learningRate*biasGradient;
    }
}

因此,我创建了一个缓冲区(m_pGradients),它是输入缓冲区的尺寸,用于将梯度反馈到上一层,但使用梯度总和来调整权重.

So, I create a buffer (m_pGradients) which is the dimensions of the input buffer to feed the gradients back to the previous layer but use the gradient sum to adjust the weights.

max pooling像这样计算梯度(它会保存max索引,并将所有其他梯度归零)

The max pooling calculates the gradients back like so (it saves the max indices and zeros all the other gradients out)

void MaxPooling::Train(std::shared_ptr<float> input,std::shared_ptr<float> outputGradients, float label)
{
    for (int outputVolumeIndex = 0; outputVolumeIndex <m_OutputVolumeSize; ++outputVolumeIndex)
    {
        int inputIndex = m_Indices.get()[outputVolumeIndex];
        m_pGradients.get()[inputIndex] = outputGradients.get()[outputVolumeIndex];
    }
}

最后的回归层按如下方式计算其梯度:

And the final regression layer calculates its gradients like this:

void LinearClassifier::Train(std::shared_ptr<float> data,std::shared_ptr<float> output, float y)
{
    float * x  = data.get();

    float biasError = 0.0f;
    float h = Hypothesis(output) - y;

    for (int i =1; i < m_NumberOfWeights; ++i)
    {
        float error = h*x[i];
        m_pGradients.get()[i] = error;
        biasError += error;
    }

    float cost = h;
    m_Error = cost*cost;

    for (int theta = 1; theta < m_NumberOfWeights; ++theta)
    {
        m_pWeights.get()[theta] = m_pWeights.get()[theta] - learningRate*m_pGradients.get()[theta];
    }

    m_pWeights.get()[0] -= learningRate*biasError;
}

在对两个示例进行了100次迭代训练之后,每个示例的预测彼此相同,并且从一开始就没有变化.

After 100 iterations of training on the two examples the prediction on each is the same as the other and unchanged from the start.

  1. 像这样的卷积网络能够区分这两个类别吗?
  2. 这是正确的方法吗?
  3. 我应该考虑卷积层反向传播中的ReLU(最大值)吗?

推荐答案

  1. 像这样的卷积网络能够区分这两个类别吗?

是的.实际上,即使线性分类器本身也应该能够非常容易地区分(如果图像或多或少居中).

Yes. In fact even linear classifier itself should be able to discriminate very easily (if images are more or less centered).

  1. 这是正确的方法吗?

最可能的原因是您的渐变公式中有错误.始终遵循2条简单的规则:

The most probable cause is error in your gradient formulas. Always follow 2 easy rules:

  1. 基本模型开始.不要以2-conv网络开头.在没有任何卷积的情况下启动代码.现在可以用吗?当您具有1个线性层时,请添加单个卷积.现在可以用吗?等等.
  2. 始终以数字方式检查您的渐变.这很容易做到,并且可以节省您的调试时间!从分析中回想起

  1. Start with basic model. Do not start with 2-conv network. Start your code without any convolutions. Does it work now? When you have working 1 linear layer, add single convolution. Does it work now? and so on.
  2. Always check your gradients numerically. This is so easy to do and will save you hours of debuging! recall from analysis that

[grad f(x) ]_i ~  (f(x+eps*e_i) - f(x-eps*e_i)) / 2*eps

其中[] _i是第i个坐标,e_i是第i个规范矢量(第i个坐标为零的零矢量)

where by []_i I mean i'th coordinate, and by e_i I mean i'th canonical vector (zero vector with one on i'th coordinate)

我应该考虑卷积层反向传播中的ReLU(最大值)吗?

Should I be accounting for the ReLU (max) in the convolution layer backpropagation?

是的,ReLU会改变您的梯度,因为这是您需要区分的非线性.再次-回到第1点.从简单模型开始,然后分别添加每个元素,以找出哪个元素会导致渐变/模型崩溃.

Yes, ReLU alters your gradient, as this is a nonlinearity which you need to differentiate. Again - back to point 1. start with simple models, and add each element separately to find which one causes your gradients/model to crash.

这篇关于卷积神经网络不收敛的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆