为什么不训练部分纪元? [英] Why not train for partial epochs?

查看:65
本文介绍了为什么不训练部分纪元?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

似乎没有人会说"10.5"时代来运行他们的模型.理论上的原因是什么?

对我来说有些直觉,如果我有一组完全独特的样本训练,则训练不足和过度训练之间的最佳拐点应该在整个历元之间.但是,在大多数情况下,单个训练样本通常会以一种或另一种方式相似/相关.

是否有可靠的统计依据?否则,有人进行过实证调查吗?

解决方案

我反对这样的前提:在我工作的地方,我们经常会运行部分时期,尽管大型数据集的范围更大:例如40.72个时期. /p>

对于小的数据集或短时间的训练,要以相等的权重对待每个观察值,因此很自然地认为一个人需要处理相同的次数.如您所指出的,如果输入样本相关,那么这样做就不那么重要了.

我认为一个基本的原因是方便:整数更易于解释和讨论. 对于许多模型,在最佳训练时没有膝盖:这是一条柔和的曲线,因此几乎可以肯定,在精度的最佳点"上有整数个历元.因此,更方便地发现10个纪元比11个纪元好一点,即使最佳点(在多次训练中以迭代计数的微小差异发现)的最佳点恰好是10.2个纪元.递减的收益表示,如果9-12个时期给我们非常相似的好结果,我们只是注意到10是8-15个时期内的最佳表现,接受结果并继续余生.

Nobody ever seems to run their model for say '10.5' epochs. What is the theoretical reason for this?

It is somewhat intuitive to me that if I had a training set of perfectly unique samples, the optimal knee point between undertraining and overtraining should be between full epochs. However, in most cases individual training samples will often be similar/related in one way or another.

Is there a solid statistics based reason? Or else, did anyone empirically investigate?

解决方案

I dispute the premise: where I work, we often run for partial epochs, although the range is higher for the large data sets: say, 40.72 epochs.

For small data sets or short training, it's a matter of treating each observation with equal weight, so it's natural to think that one needs to process each the same number of times. As you point out, if the input samples are related, then it's less important to do so.

I would think that one base reason is convenience: integers are easier to interpret and discuss. For many models, there is no knee at optimal training: it's a gentle curve, such that there is almost certainly an integral number of epochs within the "sweet spot" of accuracy. Thus, it's more convenient to find that 10 epochs is a little better than 11, even if the optimal point (found with multiple training runs at tiny differences in iteration count) happens to be 10.2 epochs. Diminishing returns says that if 9-12 epochs give us very similar, good results, we simply note that 10 is the best performance in the range 8-15 epochs, accept the result, and get on with the rest of life.

这篇关于为什么不训练部分纪元?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆