迷你批次梯度下降，亚当和历元 [英] Mini Batch Gradient Descent, adam and epochs

查看：58 发布时间：2021/5/31 18:42:55 python machine-learning regression

本文介绍了迷你批次梯度下降，亚当和历元的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在上一门有关Python深度学习的课程，并且只停留在示例的以下几行:

I am taking a course on Deep Learning in Python and I am stuck on the following lines of an example:

regressor.compile(optimizer = 'adam', loss = 'mean_squared_error')
regressor.fit(X_train, y_train, epochs = 100, batch_size = 32)

根据我知道的定义，1个纪元=遍历所有训练示例一次以进行一次权重更新.

From the definitions I know, 1 epoch = going through all training examples once to do one weight update.

batch_size 用于优化程序，可将训练示例分成小批.每个迷你批次的大小为 batch_size .

batch_size is used in optimizer that divide the training examples into mini batches. Each mini batch is of size batch_size.

我不熟悉亚当优化，但我相信这是GD或Mini batch GD的变体.梯度下降-具有一大批(所有数据)，但有多个时期.迷你批次渐变下降-使用多个迷你批次，但只有1个时期.

I am not familiar with adam optimization, but I believe it is a variation of the GD or Mini batch GD. Gradient Descent - has one big batch (all the data), but multiple epochs. Mini Batch Gradient Descent - uses multiple mini batches, but only 1 epoch.

然后，代码为什么同时具有多个迷你批处理和多个纪元?这段代码中的epoch的含义是否与上面的定义不同?

Then, how come the code has both multiple mini batches and multiple epochs? Does epoch in this code has a different meaning then the definition above?

推荐答案

您对 epoch 和 batch_size 的理解似乎是正确的.

Your understanding of epoch and batch_size seems correct.

下面的精度更高.

一个时期对应于整个训练数据集扫描.可以通过多种方式执行此扫描.

An epoch corresponds to one whole training dataset sweep. This sweep can be performed in several ways.

批处理模式:整个训练数据集的损失梯度用于更新模型权重.一个优化迭代对应一个纪元.
随机模式:使用一个训练数据集点的损失梯度来更新模型权重.如果训练数据集中有N个示例，则N个优化迭代对应一个时期.
小批量模式:来自训练数据集的少量点样本的损失梯度用于更新模型权重.该样本的大小为 batch_size .如果训练数据集中有 N_examples 个示例，则 N_examples/batch_size 优化迭代对应一个时期.

Batch mode: Gradient of loss over the whole training dataset is used to update model weights. One optimisation iteration corresponds to one epoch.
Stochastic mode: Gradient of loss over one training dataset point is used to update model weights. If there are N examples in the training dataset, N optimisation iterations correspond to one epoch.
Mini-batch mode: Gradient of loss over a small sample of points from the training dataset is used to update model weights. The sample is of size batch_size. If there are N_examples examples in the training dataset, N_examples/batch_size optimisation iterations correspond to one epoch.

在您的情况下( epochs = 100 ， batch_size = 32 )， regressor 会扫描整个数据集100个项目，其中包含迷你数据大小为32的批次(即小批量模式).

In your case (epochs=100, batch_size=32), the regressor would sweep the whole dataset 100 items, with mini data batches of size 32 (ie. Mini-batch mode).

如果我假设您的数据集大小为 N_examples ，则 regressor 将执行 N_examples/32 模型权重优化迭代每个时期.

If I assume your dataset size is N_examples, the regressor would perform N_examples/32 model weight optimisation iteration per epoch.

因此对于100个时期: 100 * N_examples/32 模型权重优化迭代.

So for 100 epochs: 100*N_examples/32 model weight optimisation iterations.

总而言之，具有 epoch> 1 和具有 batch_size> 1 .

All in all, having epoch>1 and having batch_size>1 are compatible.

这篇关于迷你批次梯度下降，亚当和历元的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

迷你批次梯度下降，亚当和历元 [英] Mini Batch Gradient Descent, adam and epochs

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

迷你批次梯度下降，亚当和历元 [英] Mini Batch Gradient Descent, adam and epochs

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭