随着批次大小的变化,学习率应如何变化? [英] How should the learning rate change as the batch size change?

查看:283
本文介绍了随着批次大小的变化,学习率应如何变化?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我增加/减少SGD中使用的微型批次的批次大小时,应该更改学习率吗?如果是这样,那怎么办?

When I increase/decrease batch size of the mini-batch used in SGD, should I change learning rate? If so, then how?

作为参考,我正在与某人讨论,据说,当批量增加时,学习率应在一定程度上降低.

For reference, I was discussing with someone, and it was said that, when batch size is increased, the learning rate should be decreased by some extent.

我的理解是,当我增加批次大小时,计算得出的平均梯度会减少噪音,因此我可以保持相同的学习率或提高学习率.

My understanding is when I increase batch size, computed average gradient will be less noisy and so I either keep same learning rate or increase it.

此外,如果我使用自适应学习率优化器(如Adam或RMSProp),那么我想我可以保持学习率不变.

Also, if I use an adaptive learning rate optimizer, like Adam or RMSProp, then I guess I can leave learning rate untouched.

请,如果我弄错了,请纠正我,并对此提供任何见识.

Please,, correct me if I am mistaken and give any insight on this.

推荐答案

理论建议,将批处理大小乘以k时,应将学习率乘以sqrt(k)以使梯度期望的方差保持恒定.请参见 A的第5页.克里热夫斯基.卷积神经网络并行化的一个怪异技巧: https://arxiv.org/abs/1404.5997

Theory suggests that when multiplying the batch size by k, one should multiply the learning rate by sqrt(k) to keep the variance in the gradient expectation constant. See page 5 at A. Krizhevsky. One weird trick for parallelizing convolutional neural networks: https://arxiv.org/abs/1404.5997

但是,最近对大型微型批次进行的实验提出了一种更简单的线性缩放规则,即,使用kN的微型批次大小时,将学习率乘以k. 参见 P.Goyal等人:准确的大型微型批处理SGD:1小时内训练ImageNet https ://arxiv.org/abs/1706.02677

However, recent experiments with large mini-batches suggest for a simpler linear scaling rule, i.e multiply your learning rate by k when using mini-batch size of kN. See P.Goyal et al.: Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour https://arxiv.org/abs/1706.02677

我要说的是,使用亚当(Adam),阿达格勒(Adagrad)和其他自适应优化器,如果批量大小没有实质性变化,则学习率可能保持不变.

I would say that with using Adam, Adagrad and other adaptive optimizers, learning rate may remain the same if batch size does not change substantially.

这篇关于随着批次大小的变化,学习率应如何变化?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆