神经网络:反向传播阶段的分步细分? [英] Neural Networks: A step-by-step breakdown of the Backpropagation phase?

查看:55
本文介绍了神经网络:反向传播阶段的分步细分?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须设计一个具有功能性的神经网络的动画视觉表示(即具有允许您调整值的UI).它的主要目标是帮助人们可视化如何以及何时以慢动作实时动画执行不同的数学运算.我已经建立了视觉效果和UI,可用来调整值和更改神经元的布局以及前馈阶段的可视化效果,但是由于我实际上并不专门研究神经网络,因此我我在寻找可视化反向传播阶段的最佳方法时遇到了麻烦-主要是由于在此阶段我一直难以弄清确切的操作顺序.

可视化从向前发射神经元开始,然后在发射神经元链到达输出之后,动画显示了实际值和预测值之间的差异,从这一点来看,我想可视化网络向后发射,同时演示正在发生的数学运算.但这是我真的不确定应该发生什么的地方.

所以我的问题是:

  • 在反向传播阶段实际上调整了哪些权重?是否在整个神经网络中调整了所有权重,或者只是在前向传递过程中触发了权重?
  • 在此阶段,每个隐藏层中的所有权重是否都按相同的量进行了调整,或者是否已被当前权重所抵消的值或其他某个值进行了调整?对我来说,将它们全部调整相同的数量而又不会被曲线或类似的东西抵消并没有什么意义.

我已经在网上找到了很多有关前馈阶段的重要信息,但是当涉及到反向传播阶段时,我很难找到关于该阶段实际发生情况的任何直观的可视化/解释.

解决方案

在反向传播阶段实际上调整了哪些权重?是否在整个神经网络中调整了所有权重,或者只是在前向传递过程中触发了权重?

这取决于您如何构建神经网络,通常先向网络进行正向传播,然后在反向传播阶段进行反向传播,然后根据误差和Sigmoid导数来调整权重.您可以选择要调整的权重以及所用结构的类型.对于一个简单的Perceptron网络(根据我的了解),将调整每个权重.

在此阶段中,每个隐藏层中的所有权重是否都按相同的量进行了调整,或者是否已被当前权重所抵消的值或其他某个值进行了调整?对我来说,将它们全部调整为相同的量而没有被曲线或类似的东西抵消是没有意义的.

反向传播在某种程度上取决于您使用的结构类型.您通常使用某种算法-通常使用梯度下降或随机梯度下降来控制调整权重的大小.据我所知,在Perceptron网络中,每个权重都根据其自身的值进行调整.

总而言之,反向传播只是一种调整权重的方法,以使输出值更接近所需的结果.它还可以帮助您查看梯度下降的趋势,或者

梯度下降说明:

梯度下降是一种方法,它涉及对权重的最佳调整.必须找到最佳的重量值.在反向传播迭代期间,实际输出距离预期输出越远,权重的变化就越大.您可以把它想象成一个颠倒的山丘,在每次迭代中,从山坡上滚下来的球在到达底部时会变快,然后变慢.

向千里眼的信用.

与标准梯度下降示例的用例相比,当更难找到最佳权重值时,随机梯度下降是一种更高级的方法.这可能不是最好的解释,因此要获得更清晰的解释,请参阅此视频.有关随机梯度下降的清晰说明,请参阅此视频.

I have to design an animated visual representation of a neural network that is functional (i.e. with UI that allows you to tweak values etc). The primary goal with it is to help people visualize how and when the different math operations are performed in a slow-motion, real-time animation. I have the visuals set up along with the UI that allows you to tweak values and change the layout of the neurons, as well as the visualizations for the feed forward stage, but since I don’t actually specialize in neural networks at all, I’m having trouble figuring out the best way to visualize the back propagation phase- mainly due to the fact that I’ve had trouble figuring out the exact order of operations during this stage.

The visualization starts by firing neurons forward, and then after that chain of fired neurons reach an output, an animation shows the difference between the actual and predicted values, and from this point I want to visualize the network firing backwards while demonstrating the math that is taking place. But this is where I really am unsure about what exactly is supposed to happen.

So my questions are:

  • Which weights are actually adjusted in the backpropagation phase? Are all of the weights adjusted throughout the entire neural network, or just the ones that fired during the forward pass?
  • Are all of the weights in each hidden layer adjusted by the same amount during this phase, or are they adjusted by a value that is offset by their current weight, or some other value? It didn't really make sense to me that they would all be adjusted by the same amount, without being offset by a curve or something of the sort.

I’ve found a lot of great information about the feed forward phase online, but when it comes to the backpropagation phase I’ve had a lot of trouble finding any good visualizations/explanations about what is actually happening during this phase.

解决方案

Which weights are actually adjusted in the back-propagation phase? Are all of the weights adjusted throughout the entire neural network, or just the ones that fired during the forward pass?

It depends on how you build the neural network, typically you forward-propagate your network first, and then back-propagate, in the back-propagation phase, the weights are adjusted based on the error and Sigmoid derivative. It is up to you to choose which weights are adjusted, as well as the type of structure that you have. For a simple Perceptron network (based on what I know) every weight would be adjusted.

Are all of the weights in each hidden layer adjusted by the same amount during this phase, or are they adjusted by a value that is offset by their current weight, or some other value? It didn't really make sense to me that they would all be adjusted by the same amount, without being offset by a curve or something of the sort.

Back-propagation slightly depends on the type of structure you are using. You usually use some kind of algorithm - usually a gradient descent or stochastic gradient descent to control how much a weight is adjusted. From what I know, in a Perceptron network every weight is adjusted by it's own value.

In conclusion, a back-propagation is just a way to adjust the weights so that the output values are closer to the desired result. It might also help you to look in to gradient descent, or watch a network being built from scratch (I learned how to build neural networks through breaking them down step-by-step).

Here is my own version of a step-by-step break down of back-propagation:

  1. The error is calculated based on the difference between the actual outputs and the expected outputs.

  2. The adjustments matrix/vector is calculated by finding the dot product of the error matrix/vector and the Sigmoid derivative of training inputs.

  3. The adjustments are applied to the weights.

  4. Steps 1 - 3 are iterated many times until the actual outputs are close to the expected outputs.

EXT. In a more complicated neural network you might use stochastic gradient descent or gradient descent to find the best adjustments for the weights.

Edit on Gradient Descent:

Gradient descent, also known as the network derivative, is a method of finding a good adjustment value to change your weights in back-propagation.

Gradient Descent Formulae: f(X) = X * (1 - X)

Gradient Descent Formulae (Programmatic):

Gradient Descent Explanation:

Gradient descent is a method which involves finding the best adjustment to a weight. It is necessary so that the best weight values can be found. During the back-propagation iteration, the further the actual output is from the expected output, the bigger the change to the weights is. You can imagine it as an inverted hill, and in each iteration, the ball rolling down the hill goes faster and then slower as it reaches the bottom.

Credit to Clairvoyant.

Stochastic gradient descent is a more advanced method used when the best weight value is harder to find than in the use case of a standard gradient descent example. This might not be the best explanation, so for a much clearer explanation, refer to this video. For a clear explanation of stochastic gradient descent, refer to this video.

这篇关于神经网络:反向传播阶段的分步细分?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆