反向传播中的Gradient函数如何工作? [英] How does the Gradient function work in Backpropagation?

查看:84
本文介绍了反向传播中的Gradient函数如何工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在反向传播中,是使用梯度w.r.t计算的损耗w.r.t层L的梯度.L-1层?或者是损失w.r.t.使用梯度w.r.t层L?计算的L-1层

解决方案

在反向传播中使用了梯度下降函数来找到最佳的值来调整权重.梯度下降有两种常见类型:梯度下降随机梯度下降.

梯度下降是一项确定更改权重的最佳调整值的功能.在每次迭代中,它确定应该调整权重的体积/数量,离最佳确定权重越远,调整值将越大.您可以把它想象成一个滚下山坡的球;球的速度是调整值,而山是可能的调整值.本质上,您希望球(调整值)尽可能接近世界底部(可能的调整).球的速度将增加,直到到达山丘的底部为止-山丘的底部是可能的最佳值.可以在可以在此处找到该说明的更具描述性和实用性的版本.

最后总结我对问题的回答,在反向传播中,您计算​​最右边的权重矩阵的梯度,然后相应地调整权重,然后向左移动一层, L-1 code>,(在下一个权重矩阵上)并重复该步骤,换句话说,您确定渐变,相应地进行调整,然后向左移动.

我还在另一个问题中对此进行了详细讨论,这可能有助于 解决方案

A gradient descent function is used in back-propagation to find the best value to adjust the weights by. There are two common types of gradient descent: Gradient Descent, and Stochastic Gradient Descent.

Gradient descent is a function that determines the best adjustment value to change the weights by. Over each iteration, it determines the volume/amount the weights should be adjusted by, the further away from the best determined weight, the bigger the adjustment value will be. You can think of it as a ball rolling down a hill; the ball's velocity being the adjustment value, and the hill being the possible adjustment values. Essentially, you want the ball (adjustment value) to be closest to the bottom of the world (possible adjustment) as possible. The ball's velocity will increase until it reaches the bottom of the hill - the bottom of the hill is the best possible value. A more practical explanation can be found here.

Stochastic gradient descent is a more complicated version of the gradient descent function and it is used in a neural network that may have a false-best adjustment value, where regular gradient descent won't find the best value, but a value it think's is the best. This can be analogised as the ball rolling down two hills, the hills are different in height. It rolls down the first hill and reaches the bottom of the first hill, thinking that it's reached the best possible answer, but with stochastic gradient descent, it would know that the position it was in now was not the best position, but in reality, the bottom of the second hill.

The left is what gradient descent would output. The right is what the stochastic gradient descent would find (the best possible value). A more descriptive and practical version of this explanation can be found here.

And finally to conclude my answer to your question, in back-propagation you calculate the furthest right weight-matrix's gradient and then adjust the weights accordingly, then you move one layer to the left, L-1, (on the next weight-matrix) and repeat the step, so in other words you determine the gradient, adjust accordingly and then move the the left.

I have also talked about this in detail in another question, it might help to check that one out.

这篇关于反向传播中的Gradient函数如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆