什么是 Caffe 或 convnets 中的批量大小 [英] What is batch size in Caffe or convnets

查看:19
本文介绍了什么是 Caffe 或 convnets 中的批量大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我认为批量大小仅用于性能.批量越大,同时计算的图像就越多以训练我的网络.但我意识到,如果我改变批量大小,我的净准确度会变得更好.所以我不明白批量大小是什么.有人能解释一下什么是批量大小吗?

I thought that batch size is only for performance. The bigger the batch, more images are computed at the same time to train my net. But I realized, if I change my batch size, my net accuracy gets better. So I did not understand what batch size is. Can someone explain me what is batch size?

推荐答案

Caffe 使用 Stochastic-Gradient 进行训练-Descend (SGD):也就是说,在每次迭代时,它计算训练数据参数的(随机)梯度,并在梯度方向上移动(=改变参数).
现在,如果你写出梯度的方程 w.r.t.训练数据你会注意到,为了准确地计算梯度,你需要在每次迭代中评估所有你的训练数据:这是令人望而却步的耗时,尤其是当训练数据越来越大时.
为了克服这个问题,SGD 以随机方式逼近精确的梯度,在每次迭代时仅采样训练数据的一小部分.这一小部分是批次.
因此,batch size越大,每次迭代的梯度估计越准确.

Caffe is trained using Stochastic-Gradient-Descend (SGD): that is, at each iteration it computes the (stochastic) gradient of the parameters w.r.t the training data and makes a move (=change the parameters) in the direction of the gradient.
Now, if you write the equations of the gradient w.r.t. training data you'll notice that in order to compute the gradient exactly you need to evaluate all your training data at each iteration: this is prohibitively time consuming, especially when the training data gets bigger and bigger.
In order to overcome this, SGD approximates the exact gradient, in a stochastic manner, by sampling only a small portion of the training data at each iteration. This small portion is the batch.
Thus, the larger the batch size the more accurate the gradient estimate at each iteration.

TL;DR:批量大小会影响每次迭代时估计梯度的准确性,因此更改批量大小会影响优化采用的路径",并可能改变训练过程的结果.

TL;DR: batch size affect the accuracy of the estimated gradient at each iteration, changing the batch size therefore affect the "path" the optimization takes and may change the results of the training process.

更新:
在 ICLR 2018 会议上,展示了一项有趣的工作:
Samuel L. Smith、Pieter-Jan Kindermans、Chris Ying、Quoc V. Le 不要降低学习率,增加批量.
这项工作主要涉及改变批量大小和学习率的影响.

Update:
In ICLR 2018 conference an interesting work was presented:
Samuel L. Smith, Pieter-Jan Kindermans, Chris Ying, Quoc V. Le Don't Decay the Learning Rate, Increase the Batch Size.
This work basically relates the effect of changing batch size and learning rate.

这篇关于什么是 Caffe 或 convnets 中的批量大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆