梯度下降与牛顿的梯度下降有什么区别? [英] What is the difference between Gradient Descent and Newton's Gradient Descent?
问题描述
我了解Gradient Descent的作用.基本上,它试图通过缓慢地沿着曲线向下移动来朝向局部最优解.我试图了解平面梯度下降与牛顿法之间的实际区别是什么?
I understand what Gradient Descent does. Basically it tries to move towards the local optimal solution by slowly moving down the curve. I am trying to understand what is the actual difference between the plan gradient descent and the newton's method?
在Wikipedia上,我读到了这句话:牛顿法使用曲率信息采取更直接的路线."这凭直觉是什么意思?
From Wikipedia, I read this short line "Newton's method uses curvature information to take a more direct route." What does this intuitively mean?
推荐答案
在局部最小值(或最大值)x
处,目标函数f
的导数消失:f'(x) = 0
(假定
At a local minimum (or maximum) x
, the derivative of the target function f
vanishes: f'(x) = 0
(assuming sufficient smoothness of f
).
梯度下降尝试通过使用来自f
的一阶导数的信息来找到最小的x
:它仅遵循从当前点开始的最陡下降.这就像将球沿f
的图形滚动直到静止(忽略惯性)一样.
Gradient descent tries to find such a minimum x
by using information from the first derivative of f
: It simply follows the steepest descent from the current point. This is like rolling a ball down the graph of f
until it comes to rest (while neglecting inertia).
牛顿法试图通过用线性函数g
逼近f'
来找到满足f'(x) = 0
的点x
,然后明确地求解该函数的根(这被称为牛顿寻根法). g
的根不一定是f'
的根,但在许多情况下是一个很好的猜测(
Newton's method tries to find a point x
satisfying f'(x) = 0
by approximating f'
with a linear function g
and then solving for the root of that function explicitely (this is called Newton's root-finding method). The root of g
is not necessarily the root of f'
, but it is under many circumstances a good guess (the Wikipedia article on Newton's method for root finding has more information on convergence criteria). While approximating f'
, Newton's method makes use of f''
(the curvature of f
). This means it has higher requirements on the smoothness of f
, but it also means that (by using more information) it often converges faster.
这篇关于梯度下降与牛顿的梯度下降有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!