梯度下降的替代方法是什么? [英] What are alternatives of Gradient Descent?

查看:129
本文介绍了梯度下降的替代方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

梯度下降存在局部极小值的问题.我们需要运行梯度下降指数时间来找到全局最小值.

Gradient Descent has a problem of Local Minima. We need run gradient descent exponential times for to find global minima.

有人能告诉我有关梯度下降的其他选择吗?

Can anybody tell me about any alternatives of gradient descent with their pros and cons.

谢谢.

推荐答案

与使用的方法相比,将函数最小化要解决的问题更多,如果找到真正的全局最小值很重要,则使用诸如模拟的方法退火.这样就可以找到全局最小值,但是可能要花很长时间.

This is more a problem to do with the function being minimized than the method used, if finding the true global minimum is important, then use a method such a simulated annealing. This will be able to find the global minimum, but may take a very long time to do so.

在神经网络的情况下,局部极小值不一定是一个大问题.某些局部最小值是由于您可以通过置换隐藏层单位或取消网络的输入和输出权重等来获得功能相同的模型.此外,如果局部最小值仅略微不是最优的,则性能上的差异将很小,因此并不重要.最后,这是很重要的一点,拟合神经网络的关键问题是过度拟合,因此,积极地寻找成本函数的全局最小值可能会导致过度拟合,并且模型效果不佳.

In the case of neural nets, local minima are not necessarily that much of a problem. Some of the local minima are due to the fact that you can get a functionally identical model by permuting the hidden layer units, or negating the inputs and output weights of the network etc. Also if the local minima is only slightly non-optimal, then the difference in performance will be minimal and so it won't really matter. Lastly, and this is an important point, the key problem in fitting a neural network is over-fitting, so aggressively searching for the global minima of the cost function is likely to result in overfitting and a model that performs poorly.

添加正则项,例如重量衰减,可以帮助平滑成本函数,可以稍微减少局部极小值的问题,无论如何,我还是建议您这样做,以免过度拟合.

Adding a regularisation term, e.g. weight decay, can help to smooth out the cost function, which can reduce the problem of local minima a little, and is something I would recommend anyway as a means of avoiding overfitting.

然而,在神经网络中避免局部极小值的最佳方法是使用高斯过程模型(或径向基函数神经网络),该模型对局部极小值的问题较少.

The best method however of avoiding local minima in neural networks is to use a Gaussian Process model (or a Radial Basis Function neural network), which have fewer problems with local minima.

这篇关于梯度下降的替代方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆