epsilon超参数如何影响tf.train.AdamOptimizer? [英] How does the epsilon hyperparameter affect tf.train.AdamOptimizer?

查看:1138
本文介绍了epsilon超参数如何影响tf.train.AdamOptimizer?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我设置epsilon=10e-8时,AdamOptimizer不起作用.当我将其设置为1时,它就可以正常工作.

When I set epsilon=10e-8, AdamOptimizer doesn't work. When I set it to 1, it works just fine.

推荐答案

t<-t + 1

t <- t + 1

lr_t<-学习率* sqrt(1-beta2 ^ t)/(1-beta1 ^ t)

lr_t <- learning_rate * sqrt(1 - beta2^t) / (1 - beta1^t)

m_t<-beta1 * m_ {t-1} +(1-beta1)* g

m_t <- beta1 * m_{t-1} + (1 - beta1) * g

v_t<-beta2 * v_ {t-1} +(1-beta2)* g * g

v_t <- beta2 * v_{t-1} + (1 - beta2) * g * g

其中g是梯度

变量<-变量-lr_t * m_t/(sqrt(v_t)+ epsilon)

variable <- variable - lr_t * m_t / (sqrt(v_t) + epsilon)

在梯度几乎为零的情况下更新变量时,epsilon将避免上式中的零误差.因此,理想情况下,ε值应该很小.但是,分母中的epsilon较小会进行较大的权重更新,并且在随后的归一化中,较大的权重将始终归一化为1.

The epsilon is to avoid divide by zero error in the above equation while updating the variable when the gradient is almost zero. So, ideally epsilon should be a small value. But, having a small epsilon in the denominator will make larger weight updates and with subsequent normalization larger weights will always be normalized to 1.

所以,我想当您使用小型epsilon进行训练时,优化器将变得不稳定.

So, I guess when you train with small epsilon the optimizer will become unstable.

要权衡的是,您制作的epsilon(和分母)越大,重量更新就越小,因此训练进度会越慢.大多数时候,您希望分母能够变小.通常,ε值大于10e-4会更好.

The trade-off is that the bigger you make epsilon (and the denominator), the smaller the weight updates are and thus slower the training progress will be. Most times you want the denominator to be able to get small. Usually, the epsilon value greater than 10e-4 performs better.

通常,epsilon的默认值1e-8可能不是一个好的默认值.例如,当在ImageNet上训练Inception网络时,当前的最佳选择是1.0或0.1. 在此处检查

这篇关于epsilon超参数如何影响tf.train.AdamOptimizer?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆