我们如何分析损失与时期图? [英] How do we analyse a loss vs epochs graph?

查看:33
本文介绍了我们如何分析损失与时期图?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在训练一个语言模型,每次训练都画出损失与时期的关系.我要附上两个样本.

I'm training a language model and the loss vs epochs is plotted each time of training. I'm attaching two samples from it.

很明显,第二个显示出更好的性能.但是,从这些图中,我们什么时候决定停止训练(提前停止)?

Obviously, the second one is showing better performance. But, from these graphs, when do we take a decision to stop training (early stopping)?

我们可以从这些图中了解过度拟合和不足拟合吗?还是需要绘制其他学习曲线?

Can we understand overfitting and underfitting from these graphs or do I need to plot additional learning curves?

从这些图可以得出哪些其他推论?

What are the additional inferences that can be made from these plots?

推荐答案

第一个结论显然是第一个模型的性能要比第二个模型差,并且只要您使用相同的数据,通常是正确的,验证.如果您使用不同的分割训练模型,则不一定是这种情况.

The first conclusion is obviously that the first model performs worse than the second, and that is generally true, as long as you use the same data for validation. In the case where you train a model with different splits, that might not necessarily be the case.

此外,要回答有关过拟合/欠拟合的问题:典型的过度拟合图如下所示:

Furthermore, to answer your question regarding overfitting/underfitting: A typical graph for overfitting looks like this:

因此,在您的情况下,您显然只是达到了收敛,但实际上并不过分!(这是个好消息!)另一方面,您可以问自己是否可以获得更好的结果.我假设您正在降低学习速度,从而使您处于某种形式的平稳状态.如果是这种情况,请先尝试降低学习率,然后看看是否可以进一步减少损失.
此外,如果您仍然看到一个很长的平稳期,那么您也可以考虑早点停止,因为您实际上没有任何改善.根据您的框架,有一些实现(例如, Keras具有用于提前停止的回调,通常与验证/测试错误相关联).如果您的验证错误与图像类似,则增加,您应该考虑使用最低的验证错误作为提前停止的点.我喜欢这样做的一种方法是不时检查点模型,但前提是验证错误得到改善.
您可以得出的另一个推断是一般的学习率:如果学习率太大,则您的图形可能会非常跳跃/锯齿",而学习率很低的情况下误差只会有很小的下降,而不会呈指数级增长衰减行为.
通过比较两个示例中前几个时期下降的陡度,您可以看到这种形式的弱形式,其中第一个(学习率较低)需要更长的时间收敛.

So, in your case, you clearly just reach convergence, but don't actually overfit! (This is great news!) On the other hand, you could ask yourself whether you could achieve even better results. I am assuming that you are decaying your learning rate, which lets you pan out at some form of plateau. If that is the case, try reducing the learning rate less at first, and see if you can reduce your loss even further.
Moreover, if you still see a very long plateau, you can also consider stopping early, since you effectively gain no more improvements. Depending on your framework, there are implementations of this (for example, Keras has callbacks for early stopping, which is generally tied to the validation/testing error). If your validation error increases, similar to the image, you should consider using the loweste validation error as a point for early stopping. One way I like to do this is to checkpoint the model every now and then, but only if the validation error improved.
Another inference you can make is the learning rate in general: Is it too large, your graph will likely be very "jumpy/jagged", whereas a very low learning rate will have only a small decline in the error, and not so much exponentially decaying behavior.
You can see a weak form of this by comparing the steepness of the decline in the first few epochs in your two examples, where the first one (with the lower learning rate) takes longer to converge.

最后,如果您的培训和测试错误相距甚远(如第一种情况),您可能会问自己,您是在准确地描述问题还是对问题进行建模;在某些情况下,您可能会意识到,(数据)分发中存在一些您可能会忽略的问题.但是,由于第二张图更好,所以我怀疑您的问题就是这种情况.

Lastly, if your training and test error are very far apart (as in the first case), you might ask yourself whether you are actually accurately describing or modeling the problem; in some instances, you might realize that there is some problem in the (data) distribution that you might have overlooked. Since the second graph is way better, though, I doubt this is the case in your problem.

这篇关于我们如何分析损失与时期图?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆