反向传播算法:误差计算 [英] Back propagation algorithm: error computation

查看:114
本文介绍了反向传播算法:误差计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在编写反向传播脚本.我不确定如何更新体重值.这是一个图像,只是为了使事情变得简单.

I am currently writing a back propagation script. I am unsure how to go about updating my weight values. Here is an image just to make things simple.

我的问题:错误是如何计算和应用的?

My question: How is the error calculated and applied?

我确实知道k1和k2会产生误差值.我知道k1和k2会产生单独的误差值(目标-输出).但是,我不知道是否要使用它们.

I do know that k1 and k2 produce error values. I know that k1 and k2 produce individual error values (target - output). I do not however know if these are to be used.

我应该使用两个误差值的平均值,然后将单个误差值应用于所有权重吗?

Am I supposed to use the mean value of both error values and then apply that single error value to all of the weights?

或者我应该:

update weight Wk1j1 and Wk1j2 with the error value of k1
update weight Wk2j1 and Wk2j2 with the error value of k2
update weight Wj1i1 and Wj1i2 with the error value of j1
update weight Wj2i1 and Wj2i2 with the error value of j2

在开始拍摄之前,我了解我必须使用S型功能等.这不是问题.它总是指出我必须计算输出的误差值,这就是我感到困惑的地方.

Before you start shooting, I understand that I must use sigmoids function etc. THIS IS NOT THE QUESTION. It always states that I have to calculate the error value for the outputs, this is where I am confused.

,然后通过以下方式获取净误差值:

and then get the net error value by:

((error_k1^2) + (error_k2^2) + (error_j1^2) + (error_j2^2)) / 2

来自Wiki:

如图像所示,对于每个输出节点,这都是正确的,在我的图像示例k1和k2中. Wiki.

As the image states this is true for each of the output nodes, in my image example k1 and k2. The wiki.

图像下的两行是delta Wh和delta Wi.我应该使用哪个误差值(这基本上是我的问题,我应该使用哪个误差值来计算新的权重)

The two rows under the image is delta Wh and delta Wi. Which error value am I supposed to use (this is basically my question, which error value am I supposed to calculate the new weight with)

答案:

http://www4.rgu.ac.uk/files/chapter3%20-%20bp.pdf 第3页(notad为18)#4

http://www4.rgu.ac.uk/files/chapter3%20-%20bp.pdf page 3(notad as 18) #4

推荐答案

反向传播不会直接使用错误值.向后传播的是相对于神经网络的每个元素的误差的偏导数.最终,您可以为每个权重提供 dE/dW ,并且您可以在该梯度方向上迈出一小步.

Back-propagation does not use the error values directly. What you back-propagate is the partial derivative of the error with respect to each element of the neural network. Eventually that gives you dE/dW for each weight, and you make a small step in the direction of that gradient.

为此,您需要知道:

  • 每个神经元的激活值(从进行前馈计算时保留)

  • The activation value of each neuron (kept from when doing the feed-forward calculation)

误差函数的数学形式(例如,它可以是平方差之和).您的第一组导数将是输出层的 dE/da (其中 E 是您的错误, a 是神经元的输出) .

The mathematical form of the error function (e.g. it may be a sum of squares difference). Your first set of derivatives will be dE/da for the output layer (where E is your error and a is the output of the neuron).

神经元激活或传递函数的数学形式.在这里,您发现我们为什么使用 sigmoid 的原因,因为Sigmoid函数的 dy/dx 可以方便地用激活值来表示, dy/dx = y *(1-y)-这很快,也意味着您不必存储或重新计算加权和.

The mathematical form of the neuron activation or transfer function. This is where you discover why we use sigmoid because dy/dx of the sigmoid function can conveniently be expressed in terms of the activation value, dy/dx = y * (1 - y) - this is fast and also means you don't have to store or re-calculate the weighted sum.

请注意,我将使用与您不同的符号,因为您的标签难以表达反向传播的一般形式.

Please note, I am going to use different notation from you, because your labels make it hard to express the general form of back-propagation.

在我的注释中:

  • 括号中的上标(k)(k + 1)标识网络中的一层.

  • Superscripts in brackets (k) or (k+1) identify a layer in the network.

(k)层中有 N 个神经元,索引有下标 i

There are N neurons in layer (k), indexed with subscript i

(k + 1)层中有 M 个神经元,其中下标 j

There are M neurons in layer (k+1), indexed with subscript j

神经元的输入总和为 z

神经元的输出为 a

权重为 W ij ,并将 a i 连接到层(k )(k + 1)层中的 z j .注意 W 0j 是偏差项的权重,尽管您的图表未显示偏差输入或权重,但有时您需要包括它.

A weight is Wij and connects ai in layer (k) to zj in layer (k+1). Note W0j is the weight for bias term, and sometimes you need to include that, although your diagram does not show bias inputs or weights.

使用上述符号,反向传播算法的一般形式是一个五步过程:

With the above notation, the general form of the back-propagation algorithm is a five-step process:

1)计算输出层中每个神经元的初始 dE/da .其中 E 是您的错误值,而 a 是神经元的激活.这完全取决于您的错误功能.

1) Calculate initial dE/da for each neuron in the output layer. Where E is your error value, and a is the activation of the neuron. This depends entirely on your error function.

然后,对于每一层(从 k =最大开始,即您的输出层)

Then, for each layer (start with k = maximum, your output layer)

2)为每个神经元向后传播 dE/da dE/dz (其中 a 是您的神经元输出, z 是该层中所有输入(包括偏差)的总和.除了需要知道上面(1)的值之外,这还使用了传递函数的导数:

2) Backpropagate dE/da to dE/dz for each neuron (where a is your neuron output and z is the sum of all inputs to it including the bias) within a layer. In addition to needing to know the value from (1) above, this uses the derivative of your transfer function:

(现在将 k 减少1以与循环的其余部分保持一致):

(Now reduce k by 1 for consistency with the remainder of the loop):

3)对于上一层中的所有输出,从上层向后传播 dE/dz dE/da .这基本上涉及对将输出神经元连接到上层输入的所有权重求和.您无需为输入层执行此操作.注意它如何使用您在(2)中计算的值

3) Backpropagate dE/dz from an upper layer to dE/da for all outputs in previous layer. This basically involves summing across all weights connecting that output neuron to the inputs in the upper layer. You don't need to do this for the input layer. Note how it uses the value you calculated in (2)

4)(独立于(3)),将所有权重从该上层向后传播 dE/dz dE/dW ,将该层连接到上一层(此包括偏差项):

4) (Independently of (3)) Backpropagate dE/dz from an upper layer to dE/dW for all weights connecting that layer to the previous layer (this includes the bias term):

只需重复2到4,直到所有体重达到 dE/dW .对于更高级的网络(例如循环网络),您可以通过执行第1步来添加其他错误源.

Simply repeat 2 to 4 until you have dE/dW for all your weights. For more advanced networks (e.g. recurrent), you can add in other error sources by re-doing step 1.

5)现在有了权重导数,您可以简单地将它们减去(乘以学习率),朝着希望最小误差函数的方向迈出一步:

5) Now you have the weight derivatives, you can simply subtract them (times a learning rate) to take a step towards what you hope is the error function minimum:

第一次看到此位置时,数学符号似乎有些密集.但是,如果您看几次,您会发现实际上只有几个变量,它们是通过 i,j,k 值的某种组合来索引的.此外,使用Matlab,您可以非常轻松地表达向量和矩阵.因此,例如,这是学习单个培训示例的整个过程的样子:

The maths notation can seem a bit dense in places the first time you see this. But if you look a few times, you will see there are essentially only a few variables, and they are indexed by some combination of i, j, k values. In addition, with Matlab, you can express vectors and matrices really easily. So for instance this is what the whole process might look like for learning a single training example:

clear ; close all; clc; more off

InputVector          = [ 0.5, 0.2 ];
TrainingOutputVector = [ 0.1, 0.9 ];

learn_rate = 1.0;
W_InputToHidden  = randn( 3, 2 ) * 0.6;
W_HiddenToOutput = randn( 3, 2 ) * 0.6;

for i=1:20,
    % Feed-forward input to hidden layer
    InputsPlusBias = [1, InputVector];
    HiddenActivations = 1.0 ./ (1.0 + exp(-InputsPlusBias * W_InputToHidden));

    % Feed-forward hidden layer to output layer
    HiddenPlusBias = [ 1, HiddenActivations ];
    OutputActivations = 1.0 ./ (1.0 + exp(-HiddenPlusBias * W_HiddenToOutput));

    % Backprop step 1: dE/da for output layer (assumes mean square error)
    OutputActivationDeltas = OutputActivations - TrainingOutputVector;

    nn_error = sum( OutputActivationDeltas .* OutputActivationDeltas ) / 2;
    fprintf( 'Epoch %d, error %f\n', i, nn_error);

    % Steps 2 & 3 combined:
    % Back propagate dE/da on output layer to dE/da on hidden layer
    % (uses sigmoid  derivative)
    HiddenActivationDeltas = ( OutputActivationDeltas * W_HiddenToOutput(2:end,:)'
      .* ( HiddenActivations .* (1 - HiddenActivations) ) );

    % Steps 2 & 4 combined (twice):
    % Back propagate dE/da to dE/dW
    W_InputToHidden_Deltas  = InputsPlusBias' * HiddenActivationDeltas;
    W_HiddenToOutput_Deltas = HiddenPlusBias' * OutputActivationDeltas;

    % Step 5: Alter the weights
    W_InputToHidden  -= learn_rate * W_InputToHidden_Deltas;
    W_HiddenToOutput -= learn_rate * W_HiddenToOutput_Deltas;
end;

按照本文所述,这是随机梯度下降(每个训练示例权重改变一次),并且显然仅学习一个训练示例.

As written this is stochastic gradient descent (weights altering once per training example), and obviously is only learning one training example.

在某些地方使用伪数学符号表示歉意.与Math Overflow不同,Stack Overflow没有类似于LaTex的内置数学函数.我已经跳过了第3步和第4步的某些推导/解释,以避免永远回答这个问题.

Apologies for pseudo-math notation in places. Stack Overflow doesn't have simple built in LaTex-like maths, unlike Math Overflow. I have skipped some of the derivation/explanation for steps 3 and 4 to avoid this answer taking forever.

这篇关于反向传播算法:误差计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆