选择每个时期的步数 [英] Choosing number of Steps per Epoch
问题描述
如果我想使用train_generator训练模型,选择之间有显着差异
If I want to train a model with train_generator, is there a significant difference between choosing
- 10个纪元,每个纪元500个步骤
和
- 每个纪元50个步骤的100个纪元
目前我正在训练10个时期,因为每个时期都需要很长时间,但是任何显示改进的图形看起来都非常跳跃".因为我只有10个数据点.我认为如果使用100个纪元,我可以得到一个更平滑的图形,但是我想首先知道这个图形是否有不利之处
Currently I am training for 10 epochs, because each epoch takes a long time, but any graph showing improvement looks very "jumpy" because I only have 10 datapoints. I figure I can get a smoother graph if I use 100 Epochs, but I want to know first if there is any downside to this
推荐答案
根据您的说法,听起来您需要更大的batch_size
,当然,这可能会影响steps_per_epoch和时期数
Based on what you said it sounds like you need a larger batch_size
, and of course there are implications with that which could impact the steps_per_epoch and number of epochs.
解决跳车问题
- 较大的批次大小会为您提供更好的渐变效果,并有助于防止跳动
- 您可能还想考虑一个较小的学习率,或者一个学习率调度器(或衰减器),以使网络在训练时安顿下来"
- A larger batch size will give you a better gradient and will help to prevent jumping around
- You may also want to consider a smaller learning rate, or a learning rate scheduler (or decay) to allow the network to "settle in" as it trains
批量较大的影响
- batch_size太大会产生内存问题,尤其是在使用GPU的情况下.超出限制后,请拨回直到它可以使用.这将帮助您找到系统可以使用的最大批处理大小.
- 批次数量太大会使您陷于局部最小值,因此,如果您的培训陷入困境,我会减少一些.想象一下,这里您正在过度校正 jumping-around ,并且它的跳动幅度不足以进一步最小化损失函数.
- Too large of a batch_size can produce memory problems, especially if you are using a GPU. Once you exceed the limit, dial it back until it works. This will help you find the max batch-size that your system can work with.
- Too large of a batch size can get you stuck in a local minima, so if your training get stuck, I would reduce it some. Imagine here you are over-correcting the jumping-around and it's not jumping around enough to further minimize the loss function.
何时减少时期
- 如果训练误差非常低,但是测试/验证非常高,则说明该模型过度拟合的时间过长.
- 找到适当平衡的最佳方法是对验证测试集使用早期停止功能.您可以在此处指定何时停止训练,并保存权重最大的网络,以减少最佳验证损失. (我强烈建议始终使用此功能)
何时调整每步步长
- 传统上,每个纪元的步长计算为train_length//batch_size,因为这将使用所有数据点,一次只有一个批处理大小.
- 如果您要扩充数据,则可以将其扩展一点(有时我会将该函数乘以2或3等.但是,如果它已经训练了太长时间,那么我只会坚持使用传统方法.
这篇关于选择每个时期的步数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!