神经网络:反向传播阶段的逐步分解? [英] Neural Networks: A step-by-step breakdown of the Backpropagation phase?

查看:15
本文介绍了神经网络:反向传播阶段的逐步分解?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须设计一个功能性神经网络的动画视觉表示(即具有允许您调整值等的 UI).它的主要目标是帮助人们在慢动作、实时动画中可视化不同数学运算的执行方式和时间.我设置了视觉效果和 UI,允许您调整值和更改神经元的布局,以及前馈阶段的视觉效果,但由于我实际上根本不专注于神经网络,我我无法找出可视化反向传播阶段的最佳方法 - 主要是因为我在此阶段无法确定确切的操作顺序.

可视化首先向前激发神经元,然后在激发神经元链到达输出后,动画显示实际值和预测值之间的差异,从这一点开始,我想可视化网络向后激发,同时演示正在发生的数学.但这正是我真正不确定应该发生什么的地方.

所以我的问题是:

  • 在反向传播阶段实际调整了哪些权重?是在整个神经网络中调整所有权重,还是仅调整在前向传递期间触发的权重?
  • 在这个阶段,每个隐藏层中的所有权重是否都调整了相同的数量,还是调整了一个被当前权重抵消的值,或者其他一些值?对我来说,将它们全部调整相同的数量,而不会被曲线或类似的东西抵消,这对我来说真的没有意义.

我在网上找到了很多关于前馈阶段的重要信息,但是当涉及到反向传播阶段时,我很难找到关于该阶段实际发生的任何好的可视化/解释.

解决方案

在反向传播阶段实际上调整了哪些权重?是在整个神经网络中调整了所有权重,还是仅调整了在前向传递期间触发的权重?

这取决于你如何构建神经网络,通常你先前向传播你的网络,然后反向传播,在反向传播阶段,权重根据误差和 Sigmoid 导数进行调整.您可以选择调整哪些权重,以及您拥有的结构类型.对于一个简单的感知器网络(基于我所知道的),每个权重都会被调整.

<块引用>

在这个阶段,每个隐藏层中的所有权重是否都调整了相同的数量,还是调整了一个被当前权重抵消的值,或者其他一些值?对我来说,它们都被调整相同的数量,而不被曲线或类似的东西抵消,这对我来说真的没有意义.

反向传播稍微取决于您使用的结构类型.您通常使用某种算法——通常是梯度下降或随机梯度下降来控制调整权重的程度.据我所知,在感知器网络中,每个权重都根据其自身的值进行调整.

总而言之,反向传播只是一种调整权重的方法,以便输出值更接近所需的结果.它还可以帮助您了解梯度下降,或

梯度下降解释:

梯度下降是一种涉及找到对权重的最佳调整的方法.这是必要的,以便可以找到最佳的权重值.在反向传播迭代过程中,实际输出离期望输出越远,权重的变化就越大.你可以把它想象成一个倒山,在每次迭代中,滚下山的球速度更快,到达底部时速度更慢.

感谢千里眼.

随机梯度下降是一种更高级的方法,当最佳权重值比标准梯度下降示例的用例更难找到时使用.这可能不是最好的解释,因此要获得更清晰的解释,请参考此视频. 有关随机梯度下降的清晰解释,请参阅此视频.>

I have to design an animated visual representation of a neural network that is functional (i.e. with UI that allows you to tweak values etc). The primary goal with it is to help people visualize how and when the different math operations are performed in a slow-motion, real-time animation. I have the visuals set up along with the UI that allows you to tweak values and change the layout of the neurons, as well as the visualizations for the feed forward stage, but since I don’t actually specialize in neural networks at all, I’m having trouble figuring out the best way to visualize the back propagation phase- mainly due to the fact that I’ve had trouble figuring out the exact order of operations during this stage.

The visualization starts by firing neurons forward, and then after that chain of fired neurons reach an output, an animation shows the difference between the actual and predicted values, and from this point I want to visualize the network firing backwards while demonstrating the math that is taking place. But this is where I really am unsure about what exactly is supposed to happen.

So my questions are:

  • Which weights are actually adjusted in the backpropagation phase? Are all of the weights adjusted throughout the entire neural network, or just the ones that fired during the forward pass?
  • Are all of the weights in each hidden layer adjusted by the same amount during this phase, or are they adjusted by a value that is offset by their current weight, or some other value? It didn't really make sense to me that they would all be adjusted by the same amount, without being offset by a curve or something of the sort.

I’ve found a lot of great information about the feed forward phase online, but when it comes to the backpropagation phase I’ve had a lot of trouble finding any good visualizations/explanations about what is actually happening during this phase.

解决方案

Which weights are actually adjusted in the back-propagation phase? Are all of the weights adjusted throughout the entire neural network, or just the ones that fired during the forward pass?

It depends on how you build the neural network, typically you forward-propagate your network first, and then back-propagate, in the back-propagation phase, the weights are adjusted based on the error and Sigmoid derivative. It is up to you to choose which weights are adjusted, as well as the type of structure that you have. For a simple Perceptron network (based on what I know) every weight would be adjusted.

Are all of the weights in each hidden layer adjusted by the same amount during this phase, or are they adjusted by a value that is offset by their current weight, or some other value? It didn't really make sense to me that they would all be adjusted by the same amount, without being offset by a curve or something of the sort.

Back-propagation slightly depends on the type of structure you are using. You usually use some kind of algorithm - usually a gradient descent or stochastic gradient descent to control how much a weight is adjusted. From what I know, in a Perceptron network every weight is adjusted by it's own value.

In conclusion, a back-propagation is just a way to adjust the weights so that the output values are closer to the desired result. It might also help you to look in to gradient descent, or watch a network being built from scratch (I learned how to build neural networks through breaking them down step-by-step).

Here is my own version of a step-by-step break down of back-propagation:

  1. The error is calculated based on the difference between the actual outputs and the expected outputs.

  2. The adjustments matrix/vector is calculated by finding the dot product of the error matrix/vector and the Sigmoid derivative of training inputs.

  3. The adjustments are applied to the weights.

  4. Steps 1 - 3 are iterated many times until the actual outputs are close to the expected outputs.

EXT. In a more complicated neural network you might use stochastic gradient descent or gradient descent to find the best adjustments for the weights.

Edit on Gradient Descent:

Gradient descent, also known as the network derivative, is a method of finding a good adjustment value to change your weights in back-propagation.

Gradient Descent Formulae: f(X) = X * (1 - X)

Gradient Descent Formulae (Programmatic):

Gradient Descent Explanation:

Gradient descent is a method which involves finding the best adjustment to a weight. It is necessary so that the best weight values can be found. During the back-propagation iteration, the further the actual output is from the expected output, the bigger the change to the weights is. You can imagine it as an inverted hill, and in each iteration, the ball rolling down the hill goes faster and then slower as it reaches the bottom.

Credit to Clairvoyant.

Stochastic gradient descent is a more advanced method used when the best weight value is harder to find than in the use case of a standard gradient descent example. This might not be the best explanation, so for a much clearer explanation, refer to this video. For a clear explanation of stochastic gradient descent, refer to this video.

这篇关于神经网络:反向传播阶段的逐步分解?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆