反向传播中的梯度检查 [英] Gradient checking in backpropagation

查看:53
本文介绍了反向传播中的梯度检查的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试为具有2个单位输入层,2个隐藏层和1个输出层的简单前馈神经网络实现梯度检查.我要做的是以下事情:

I'm trying to implement gradient checking for a simple feedforward neural network with 2 unit input layer, 2 unit hidden layer and 1 unit output layer. What I do is the following:

  1. 在所有层之间获取网络权重的每个权重w,并使用w + EPSILON,然后使用w-EPSILON进行前向传播.
  2. 使用两个前馈传播的结果计算数值梯度.

我不了解的是如何精确地执行反向传播.通常,我将网络的输出与目标数据进行比较(在分类的情况下),然后在网络上反向传播误差导数.但是,我认为在这种情况下必须反向传播一些其他值,因为数值梯度计算的结果不依赖于目标数据(仅依赖于输入),而误差反向传播则依赖于目标数据.那么,梯度检查的反向传播部分应该使用什么值?

What I don't understand is how exactly to perform the backpropagation. Normally, I compare the output of the network to the target data (in case of classification) and then backpropagate the error derivative across the network. However, I think in this case some other value have to be backpropagated, since in the results of the numerical gradient computation are not dependent of the target data (but only of the input), while the error backpropagation depends on the target data. So, what is the value that should be used in the backpropagation part of gradient check?

推荐答案

反向传播是在分析计算梯度之后,然后在训练时使用这些公式进行的.神经网络本质上是一个多元函数,需要找到或训练这些函数的系数或参数.

Backpropagation is performed after computing the gradients analytically and then using those formulas while training. A neural network is essentially a multivariate function, where the coefficients or the parameters of the functions needs to be found or trained.

相对于特定变量的梯度的定义是函数值的变化率.因此,正如您提到的,根据一阶导数的定义,我们可以近似得出函数,包括神经网络.

The definition of a gradient with respect to a specific variable is the rate of change of the function value. Therefore, as you mentioned, and from the definition of the first derivative we can approximate the gradient of a function, including a neural network.

要检查神经网络的解析梯度是否正确,最好使用数值方法进行检查.

To check if your analytical gradient for your neural network is correct or not, it is good to check it using the numerical method.

For each weight layer w_l from all layers W = [w_0, w_1, ..., w_l, ..., w_k]
    For i in 0 to number of rows in w_l
        For j in 0 to number of columns in w_l
            w_l_minus = w_l; # Copy all the weights
            w_l_minus[i,j] = w_l_minus[i,j] - eps; # Change only this parameter

            w_l_plus = w_l; # Copy all the weights
            w_l_plus[i,j] = w_l_plus[i,j] + eps; # Change only this parameter

            cost_minus = cost of neural net by replacing w_l by w_l_minus
            cost_plus = cost of neural net by replacing w_l by w_l_plus

            w_l_grad[i,j] = (cost_plus - cost_minus)/(2*eps)

此过程一次仅更改一个参数,并计算数值梯度.在这种情况下,我使用了(f(x + h)-f(x-h))/2h ,对我来说似乎更好.

This process changes only one parameter at a time and computes the numerical gradient. In this case I have used the (f(x+h) - f(x-h))/2h, which seems to work better for me.

请注意,您提到:由于数值梯度计算的结果不依赖于目标数据",因此并非如此.当您找到上面的 cost_minus cost_plus 时,费用是基于

Note that, you mentiond: "since in the results of the numerical gradient computation are not dependent of the target data", this is not true. As when you find the cost_minus and cost_plus above, the cost is being computed on the basis of

  1. 权重
  2. 目标类别

因此,反向传播的过程应独立于梯度检查.在反向传播更新之前计算数值梯度.在一个时期内使用反向传播计算梯度(使用与上面类似的方法).然后比较向量/矩阵的每个梯度分量,并检查它们是否足够接近.

Therefore, the process of backpropagation should be independent of the gradient checking. Compute the numerical gradients before backpropagation update. Compute the gradients using backpropagation in one epoch (using something similar to above). Then compare each gradient component of the vectors/matrices and check if they are close enough.

这篇关于反向传播中的梯度检查的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆