TensorFlow 中的批处理是什么? [英] What is a batch in TensorFlow?

查看:31
本文介绍了TensorFlow 中的批处理是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在阅读的介绍性文档(

在这里,由于您一次输入 100 张图像 (28x28)(而不是在线训练案例中的 1 张),批量大小为 100.通常这被称为mini-batch size或简称为mini-batch.

<小时>

还有下图:(作者:Martin Gorner)

现在,矩阵乘法将完全正常运行,您还将利用高度优化的数组操作,从而实现更快的训练时间.

如果您观察上图,无论您提供 100、256、2048 还是 10000 个(批量大小)图像都无关紧要,只要它适合您的 (GPU)硬件.你只会得到那么多的预测.

但是,请记住,这个批量大小会影响训练时间、您实现的错误、梯度变化等.关于哪种批量大小有效没有一般的经验法则最好.只需尝试几种尺寸,然后选择最适合您的尺寸即可.但尽量不要使用大批量,因为它会过度拟合数据.人们通常使用 32、64、128、256、512、1024、2048 的 mini-batch 大小.

<小时>

奖励:要很好地了解这个批量大小的疯狂程度,请阅读这篇论文:并行化 CNN 的奇怪技巧

The introductory documentation, which I am reading (TOC here) uses the term "batch" (for instance here) without having defined it.

解决方案

Let's say you want to do digit recognition (MNIST) and you have defined your architecture of the network (CNNs). Now, you can start feeding the images from the training data one by one to the network, get the prediction (till this step it's called as doing inference), compute the loss, compute the gradient, and then update the parameters of your network (i.e. weights and biases) and then proceed with the next image ... This way of training the model is sometimes called as online learning.

But, you want the training to be faster, the gradients to be less noisy, and also take advantage of the power of GPUs which are efficient at doing array operations (nD-arrays to be specific). So, what you instead do is feed in say 100 images at a time (the choice of this size is up to you (i.e. it's a hyperparameter) and depends on your problem too). For instance, take a look at the below picture, (Author: Martin Gorner)

Here, since you're feeding in 100 images(28x28) at a time (instead of 1 as in the online training case), the batch size is 100. Oftentimes this is called as mini-batch size or simply mini-batch.


Also the below picture: (Author: Martin Gorner)

Now, the matrix multiplication will all just work out perfectly fine and you will also be taking advantage of the highly optimized array operations and hence achieve faster training time.

If you observe the above picture, it doesn't matter that much whether you give 100 or 256 or 2048 or 10000 (batch size) images as long as it fits in the memory of your (GPU) hardware. You'll simply get that many predictions.

But, please keep in mind that this batch size influences the training time, the error that you achieve, the gradient shifts etc., There is no general rule of thumb as to which batch size works out best. Just try a few sizes and pick the one which works best for you. But try not to use large batch sizes since it will overfit the data. People commonly use mini-batch sizes of 32, 64, 128, 256, 512, 1024, 2048.


Bonus: To get a good grasp of how crazy you can go with this batch size, please give this paper a read: weird trick for parallelizing CNNs

这篇关于TensorFlow 中的批处理是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆