选择每个时期的步数 [英] Choosing number of Steps per Epoch

查看:71
本文介绍了选择每个时期的步数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我想使用train_generator训练模型,选择之间有显着差异

If I want to train a model with train_generator, is there a significant difference between choosing

  • 10个纪元,每个纪元500个步骤

  • 每个纪元50个步骤的100个纪元

目前我正在训练10个时期,因为每个时期都需要很长时间,但是任何显示改进的图形看起来都非常跳跃".因为我只有10个数据点.我认为如果使用100个纪元,我可以得到一个更平滑的图形,但是我想首先知道这个图形是否有不利之处

Currently I am training for 10 epochs, because each epoch takes a long time, but any graph showing improvement looks very "jumpy" because I only have 10 datapoints. I figure I can get a smoother graph if I use 100 Epochs, but I want to know first if there is any downside to this

推荐答案

根据您的说法,听起来您需要更大的batch_size,当然,这可能会影响steps_per_epoch和时期数

Based on what you said it sounds like you need a larger batch_size, and of course there are implications with that which could impact the steps_per_epoch and number of epochs.

解决跳车问题

  • 较大的批次大小会为您提供更好的渐变效果,并有助于防止跳动
  • 您可能还想考虑一个较小的学习率,或者一个学习率调度器(或衰减器),以使网络在训练时安顿下来"
  • A larger batch size will give you a better gradient and will help to prevent jumping around
  • You may also want to consider a smaller learning rate, or a learning rate scheduler (or decay) to allow the network to "settle in" as it trains

批量较大的影响

  • batch_size太大会产生内存问题,尤其是在使用GPU的情况下.超出限制后,请拨回直到它可以使用.这将帮助您找到系统可以使用的最大批处理大小.
  • 批次数量太大会使您陷于局部最小值,因此,如果您的培训陷入困境,我会减少一些.想象一下,这里您正在过度校正 jumping-around ,并且它的跳动幅度不足以进一步最小化损失函数.
  • Too large of a batch_size can produce memory problems, especially if you are using a GPU. Once you exceed the limit, dial it back until it works. This will help you find the max batch-size that your system can work with.
  • Too large of a batch size can get you stuck in a local minima, so if your training get stuck, I would reduce it some. Imagine here you are over-correcting the jumping-around and it's not jumping around enough to further minimize the loss function.

何时减少时期

  • 如果训练误差非常低,但是测试/验证非常高,则说明该模型过度拟合的时间过长.
  • 找到适当平衡的最佳方法是对验证测试集使用早期停止功能.您可以在此处指定何时停止训练,并保存权重最大的网络,以减少最佳验证损失. (我强烈建议始终使用此功能)

何时调整每步步长

  • 传统上,每个纪元的步长计算为train_length//batch_size,因为这将使用所有数据点,一次只有一个批处理大小.
  • 如果您要扩充数据,则可以将其扩展一点(有时我会将该函数乘以2或3等.但是,如果它已经训练了太长时间,那么我只会坚持使用传统方法.

这篇关于选择每个时期的步数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆