迷你批次梯度下降,亚当和历元 [英] Mini Batch Gradient Descent, adam and epochs

查看:58
本文介绍了迷你批次梯度下降,亚当和历元的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在上一门有关Python深度学习的课程,并且只停留在示例的以下几行:

I am taking a course on Deep Learning in Python and I am stuck on the following lines of an example:

regressor.compile(optimizer = 'adam', loss = 'mean_squared_error')
regressor.fit(X_train, y_train, epochs = 100, batch_size = 32)

根据我知道的定义,1个纪元=遍历所有训练示例一次以进行一次权重更新.

From the definitions I know, 1 epoch = going through all training examples once to do one weight update.

batch_size 用于优化程序,可将训练示例分成小批.每个迷你批次的大小为 batch_size .

batch_size is used in optimizer that divide the training examples into mini batches. Each mini batch is of size batch_size.

我不熟悉亚当优化,但我相信这是GD或Mini batch GD的变体.梯度下降-具有一大批(所有数据),但有多个时期.迷你批次渐变下降-使用多个迷你批次,但只有1个时期.

I am not familiar with adam optimization, but I believe it is a variation of the GD or Mini batch GD. Gradient Descent - has one big batch (all the data), but multiple epochs. Mini Batch Gradient Descent - uses multiple mini batches, but only 1 epoch.

然后,代码为什么同时具有多个迷你批处理和多个纪元?这段代码中的epoch的含义是否与上面的定义不同?

Then, how come the code has both multiple mini batches and multiple epochs? Does epoch in this code has a different meaning then the definition above?

推荐答案

您对 epoch batch_size 的理解似乎是正确的.

Your understanding of epoch and batch_size seems correct.

下面的精度更高.

一个时期对应于整个训练数据集扫描.可以通过多种方式执行此扫描.

An epoch corresponds to one whole training dataset sweep. This sweep can be performed in several ways.

  • 批处理模式:整个训练数据集的损失梯度用于更新模型权重.一个优化迭代对应一个纪元.
  • 随机模式:使用一个训练数据集点的损失梯度来更新模型权重.如果训练数据集中有N个示例,则N个优化迭代对应一个时期.
  • 小批量模式:来自训练数据集的少量点样本的损失梯度用于更新模型权重.该样本的大小为 batch_size .如果训练数据集中有 N_examples 个示例,则 N_examples/batch_size 优化迭代对应一个时期.
  • Batch mode: Gradient of loss over the whole training dataset is used to update model weights. One optimisation iteration corresponds to one epoch.
  • Stochastic mode: Gradient of loss over one training dataset point is used to update model weights. If there are N examples in the training dataset, N optimisation iterations correspond to one epoch.
  • Mini-batch mode: Gradient of loss over a small sample of points from the training dataset is used to update model weights. The sample is of size batch_size. If there are N_examples examples in the training dataset, N_examples/batch_size optimisation iterations correspond to one epoch.

在您的情况下( epochs = 100 batch_size = 32 ), regressor 会扫描整个数据集100个项目,其中包含迷你数据大小为32的批次(即小批量模式).

In your case (epochs=100, batch_size=32), the regressor would sweep the whole dataset 100 items, with mini data batches of size 32 (ie. Mini-batch mode).

如果我假设您的数据集大小为 N_examples ,则 regressor 将执行 N_examples/32 模型权重优化迭代每个时期.

If I assume your dataset size is N_examples, the regressor would perform N_examples/32 model weight optimisation iteration per epoch.

因此对于100个时期: 100 * N_examples/32 模型权重优化迭代.

So for 100 epochs: 100*N_examples/32 model weight optimisation iterations.

总而言之,具有 epoch> 1 和具有 batch_size> 1 .

All in all, having epoch>1 and having batch_size>1 are compatible.

这篇关于迷你批次梯度下降,亚当和历元的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆