迷你批次梯度下降,亚当和历元 [英] Mini Batch Gradient Descent, adam and epochs
问题描述
我正在上一门有关Python深度学习的课程,并且只停留在示例的以下几行:
I am taking a course on Deep Learning in Python and I am stuck on the following lines of an example:
regressor.compile(optimizer = 'adam', loss = 'mean_squared_error')
regressor.fit(X_train, y_train, epochs = 100, batch_size = 32)
根据我知道的定义,1个纪元=遍历所有训练示例一次以进行一次权重更新.
From the definitions I know, 1 epoch = going through all training examples once to do one weight update.
batch_size
用于优化程序,可将训练示例分成小批.每个迷你批次的大小为 batch_size
.
batch_size
is used in optimizer that divide the training examples into mini batches. Each mini batch is of size batch_size
.
我不熟悉亚当优化,但我相信这是GD或Mini batch GD的变体.梯度下降-具有一大批(所有数据),但有多个时期.迷你批次渐变下降-使用多个迷你批次,但只有1个时期.
I am not familiar with adam optimization, but I believe it is a variation of the GD or Mini batch GD. Gradient Descent - has one big batch (all the data), but multiple epochs. Mini Batch Gradient Descent - uses multiple mini batches, but only 1 epoch.
然后,代码为什么同时具有多个迷你批处理和多个纪元?这段代码中的epoch的含义是否与上面的定义不同?
Then, how come the code has both multiple mini batches and multiple epochs? Does epoch in this code has a different meaning then the definition above?
推荐答案
您对 epoch 和 batch_size 的理解似乎是正确的.
Your understanding of epoch and batch_size seems correct.
下面的精度更高.
一个时期对应于整个训练数据集扫描.可以通过多种方式执行此扫描.
An epoch corresponds to one whole training dataset sweep. This sweep can be performed in several ways.
- 批处理模式:整个训练数据集的损失梯度用于更新模型权重.一个优化迭代对应一个纪元.
- 随机模式:使用一个训练数据集点的损失梯度来更新模型权重.如果训练数据集中有N个示例,则N个优化迭代对应一个时期.
- 小批量模式:来自训练数据集的少量点样本的损失梯度用于更新模型权重.该样本的大小为 batch_size .如果训练数据集中有
N_examples
个示例,则N_examples/batch_size
优化迭代对应一个时期.
- Batch mode: Gradient of loss over the whole training dataset is used to update model weights. One optimisation iteration corresponds to one epoch.
- Stochastic mode: Gradient of loss over one training dataset point is used to update model weights. If there are N examples in the training dataset, N optimisation iterations correspond to one epoch.
- Mini-batch mode: Gradient of loss over a small sample of points from the training dataset is used to update model weights. The sample is of size batch_size. If there are
N_examples
examples in the training dataset,N_examples/batch_size
optimisation iterations correspond to one epoch.
在您的情况下( epochs = 100
, batch_size = 32
), regressor
会扫描整个数据集100个项目,其中包含迷你数据大小为32的批次(即小批量模式).
In your case (epochs=100
, batch_size=32
), the regressor
would sweep the whole dataset 100 items, with mini data batches of size 32 (ie. Mini-batch mode).
如果我假设您的数据集大小为 N_examples
,则 regressor
将执行 N_examples/32
模型权重优化迭代每个时期.
If I assume your dataset size is N_examples
, the regressor
would perform N_examples/32
model weight optimisation iteration per epoch.
因此对于100个时期: 100 * N_examples/32
模型权重优化迭代.
So for 100 epochs: 100*N_examples/32
model weight optimisation iterations.
总而言之,具有 epoch> 1
和具有 batch_size> 1
.
All in all, having epoch>1
and having batch_size>1
are compatible.
这篇关于迷你批次梯度下降,亚当和历元的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!