梯度函数在反向传播中是如何工作的? [英] How does the Gradient function work in Backpropagation?

查看:24
本文介绍了梯度函数在反向传播中是如何工作的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在反向传播中,是使用梯度 w.r.t. 计算的 Loss w.r.t 层 L 的梯度.L-1层?或者,是 Loss w.r.t. 的梯度.层 L-1 使用梯度 w.r.t 层 L?

解决方案

在反向传播中使用梯度下降函数来寻找调整权重的最佳值.有两种常见的梯度下降类型:Gradient DescentStochastic Gradient Descent.

梯度下降是确定改变权重的最佳调整值的函数.在每次迭代中,它确定权重应调整的体积/数量,离最佳确定权重越远,调整值将越大.你可以把它想象成一个滚下山坡的球;球的速度是调整值,山是可能的调整值.本质上,您希望球(调整值)尽可能靠近世界的底部(可能的调整).球的速度将增加,直到它到达山的底部 - 山的底部是可能的最佳值.可以在可以在此处找到此解释的更具描述性和实用性的版本.

最后总结一下我对你的问题的回答,在反向传播中,你计算最右边权重矩阵的梯度,然后相应地调整权重,然后向左移动一层,L-1,(在下一个权重矩阵上)并重复该步骤,换句话说,您确定梯度,进行相应调整,然后向左移动.

我在另一个问题中也详细讨论过,它可能有助于检查那个.

In backpropagation, is the gradient of Loss w.r.t layer L calculated using gradient w.r.t. layer L-1? Or, is the gradient of Loss w.r.t. layer L-1 calculated using the gradient w.r.t layer L?

解决方案

A gradient descent function is used in back-propagation to find the best value to adjust the weights by. There are two common types of gradient descent: Gradient Descent, and Stochastic Gradient Descent.

Gradient descent is a function that determines the best adjustment value to change the weights by. Over each iteration, it determines the volume/amount the weights should be adjusted by, the further away from the best determined weight, the bigger the adjustment value will be. You can think of it as a ball rolling down a hill; the ball's velocity being the adjustment value, and the hill being the possible adjustment values. Essentially, you want the ball (adjustment value) to be closest to the bottom of the world (possible adjustment) as possible. The ball's velocity will increase until it reaches the bottom of the hill - the bottom of the hill is the best possible value. A more practical explanation can be found here.

Stochastic gradient descent is a more complicated version of the gradient descent function and it is used in a neural network that may have a false-best adjustment value, where regular gradient descent won't find the best value, but a value it think's is the best. This can be analogised as the ball rolling down two hills, the hills are different in height. It rolls down the first hill and reaches the bottom of the first hill, thinking that it's reached the best possible answer, but with stochastic gradient descent, it would know that the position it was in now was not the best position, but in reality, the bottom of the second hill.

The left is what gradient descent would output. The right is what the stochastic gradient descent would find (the best possible value). A more descriptive and practical version of this explanation can be found here.

And finally to conclude my answer to your question, in back-propagation you calculate the furthest right weight-matrix's gradient and then adjust the weights accordingly, then you move one layer to the left, L-1, (on the next weight-matrix) and repeat the step, so in other words you determine the gradient, adjust accordingly and then move the the left.

I have also talked about this in detail in another question, it might help to check that one out.

这篇关于梯度函数在反向传播中是如何工作的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆