Tensorflow中的Adam Optimizer导致的损失突然增加 [英] Loss suddenly increases with Adam Optimizer in Tensorflow

查看:681
本文介绍了Tensorflow中的Adam Optimizer导致的损失突然增加的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用CNN进行回归任务.我使用Tensorflow,优化器是Adam.网络似乎收敛得很好,直到损失随验证误差突然增加的某一点.这是标签的损失图和分离的权重(Optimizer在它们的总和上运行)

I am using a CNN for a regression task. I use Tensorflow and the optimizer is Adam. The network seems to converge perfectly fine till one point where the loss suddenly increases along with the validation error. Here are the loss plots of the labels and the weights separated (Optimizer is run on the sum of them)

我将l2损失用于体重调整以及标签.我对训练数据采用一些随机性.我目前正在尝试RSMProp来查看行为是否发生变化,但至少需要8个小时才能重现该错误.

I use l2 loss for weight regularization and also for the labels. I apply some randomness on the training data. I am currently trying RSMProp to see if the behavior changes but it takes at least 8h to reproduce the error.

我想了解这是怎么发生的.希望你能帮助我.

I would like to understand how this can happen. Hope you can help me.

推荐答案

我最近几个月的经验如下: 亚当非常易于使用,因为您不必太过用初始学习率来玩,而且它几乎总是可以工作的.但是,当达到收敛时,Adam并没有真正解决问题,而是在更高的迭代中微动. SGD给出了几乎完美形状的损耗图,并且似乎在更高的迭代中收敛得更好. 但是更改设置的小部分需要调整SGD参数,否则您将得到NaNs ...对于体系结构和通用方法的实验,我赞成Adam,但是如果要获得一种所选体系结构的最佳版本,则应使用SGD,至少要比较解决方案.

My experience over the last months is the following: Adam is very easy to use because you don't have to play with initial learning rate very much and it almost always works. However, when coming to convergence Adam does not really sattle with a solution but jiggles around at higher iterations. While SGD gives an almost perfectly shaped loss plot and seems to converge much better in higher iterations. But changing litte parts of the setup requires to adjust the SGD parameters or you will end up with NaNs... For experiments on architectures and general approaches I favor Adam, but if you want to get the best version of one chosen architecture you should use SGD and at least compare the solutions.

我还注意到,良好的初始SGD设置(学习率,体重减轻等)收敛速度与使用Adam一样快,这是我的设置. 希望对您有所帮助!

I also noticed that a good initial SGD setup (learning rate, weight decay etc.) converges as fast as using Adam, at leas for my setup. Hope this may help some of you!

请注意,即使是亚当,我最初提出的问题也是正常的.好像我有一个错误,但我真的不记得那里的问题.

Please note that the effects in my initial question are NOT normal even with Adam. Seems like I had a bug but I can't really remember the issue there.

这篇关于Tensorflow中的Adam Optimizer导致的损失突然增加的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆