Caffe或convnets中的批量大小是多少 [英] What is batch size in Caffe or convnets

查看:95
本文介绍了Caffe或convnets中的批量大小是多少的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我认为批量大小仅是为了提高性能.批次越大,就可以同时计算出更多的图像来训练我的网络.但是我意识到,如果我改变批量大小,我的净精度会更好.所以我不明白什么是批量大小.有人可以解释一下什么是批量大小吗?

I thought that batch size is only for performance. The bigger the batch, more images are computed at the same time to train my net. But I realized, if I change my batch size, my net accuracy gets better. So I did not understand what batch size is. Can someone explain me what is batch size?

推荐答案

使用随机梯度训练Caffe -下降(SGD):也就是说,在每次迭代时,它会计算训练数据的参数的(随机)梯度,并沿梯度的方向移动(=更改参数).
现在,如果您写出梯度w.r.t.训练数据,您会注意到,要精确计算梯度 ,您需要在每次迭代中评估 all 您的训练数据 :非常耗时,尤其是当训练数据越来越大时.
为了克服这个问题,SGD随机地通过在每次迭代中仅对训练数据的一小部分进行采样来近似精确的梯度.这是一小部分.
因此,批次大小越大,每次迭代的梯度估计值越准确.

Caffe is trained using Stochastic-Gradient-Descend (SGD): that is, at each iteration it computes the (stochastic) gradient of the parameters w.r.t the training data and makes a move (=change the parameters) in the direction of the gradient.
Now, if you write the equations of the gradient w.r.t. training data you'll notice that in order to compute the gradient exactly you need to evaluate all your training data at each iteration: this is prohibitively time consuming, especially when the training data gets bigger and bigger.
In order to overcome this, SGD approximates the exact gradient, in a stochastic manner, by sampling only a small portion of the training data at each iteration. This small portion is the batch.
Thus, the larger the batch size the more accurate the gradient estimate at each iteration.

TL; DR :批次大小会影响每次迭代时估计梯度的准确性,更改批次大小会因此影响优化所采用的路径",并且可能会更改训练过程的结果.

TL;DR: batch size affect the accuracy of the estimated gradient at each iteration, changing the batch size therefore affect the "path" the optimization takes and may change the results of the training process.

更新:
在ICLR 2018会议上,提出了一项有趣的工作:
Samuel L. Smith,Pieter-Jan Kindermans,Chris Ying,Quoc V. Le 不要降低学习速度,增加批量大小 .
这项工作基本上与改变批量大小和学习率的效果有关.

Update:
In ICLR 2018 conference an interesting work was presented:
Samuel L. Smith, Pieter-Jan Kindermans, Chris Ying, Quoc V. Le Don't Decay the Learning Rate, Increase the Batch Size.
This work basically relates the effect of changing batch size and learning rate.

这篇关于Caffe或convnets中的批量大小是多少的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆