我们如何分析损失与纪元图? [英] How do we analyse a loss vs epochs graph?

查看:19
本文介绍了我们如何分析损失与纪元图?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在训练一个语言模型,每次训练时都会绘制损失与时期的关系图.我附上了两个样本.

I'm training a language model and the loss vs epochs is plotted each time of training. I'm attaching two samples from it.

显然,第二个表现更好.但是,从这些图中,我们什么时候决定停止训练(提前停止)?

Obviously, the second one is showing better performance. But, from these graphs, when do we take a decision to stop training (early stopping)?

我们能否从这些图中理解过度拟合和欠拟合,或者我是否需要绘制额外的学习曲线?

Can we understand overfitting and underfitting from these graphs or do I need to plot additional learning curves?

从这些图中可以做出哪些额外的推论?

What are the additional inferences that can be made from these plots?

推荐答案

第一个结论显然第一个模型的性能比第二个差,这通常是正确的,只要你使用相同的数据验证.如果您训练具有不同分割的模型,情况可能不一定如此.

The first conclusion is obviously that the first model performs worse than the second, and that is generally true, as long as you use the same data for validation. In the case where you train a model with different splits, that might not necessarily be the case.

此外,回答您关于过拟合/欠拟合的问题:典型的过拟合图如下所示:

Furthermore, to answer your question regarding overfitting/underfitting: A typical graph for overfitting looks like this:

因此,就您而言,您显然只是达到了收敛,但实际上并没有过度拟合!(这是个好消息!)另一方面,您可以问问自己是否可以取得更好的结果.我假设你正在衰减你的学习率,这让你在某种形式的高原上表现出色.如果是这种情况,请先尝试降低学习率,看看是否可以进一步降低损失.
此外,如果您仍然看到一个很长的平台期,您也可以考虑提前停止,因为您实际上不会获得更多的改进.根据您的框架,有一些实现(例如,Keras 具有用于提前停止的回调,这通常与验证/测试错误相关).如果您的验证错误增加,类似于图像,您应该考虑使用最低验证错误作为提前停止的点.我喜欢这样做的一种方法是不时检查模型,但前提是验证错误有所改善.
您可以做出的另一个推论是一般的学习率:它是否太大,您的图形可能会非常跳跃/锯齿",而非常低的学习率只会使错误略有下降,而不是呈指数级下降腐朽的行为.
通过比较两个示例中前几个时期下降的陡度,您可以看到这种弱形式,其中第一个(学习率较低)需要更长的时间才能收敛.

So, in your case, you clearly just reach convergence, but don't actually overfit! (This is great news!) On the other hand, you could ask yourself whether you could achieve even better results. I am assuming that you are decaying your learning rate, which lets you pan out at some form of plateau. If that is the case, try reducing the learning rate less at first, and see if you can reduce your loss even further.
Moreover, if you still see a very long plateau, you can also consider stopping early, since you effectively gain no more improvements. Depending on your framework, there are implementations of this (for example, Keras has callbacks for early stopping, which is generally tied to the validation/testing error). If your validation error increases, similar to the image, you should consider using the loweste validation error as a point for early stopping. One way I like to do this is to checkpoint the model every now and then, but only if the validation error improved.
Another inference you can make is the learning rate in general: Is it too large, your graph will likely be very "jumpy/jagged", whereas a very low learning rate will have only a small decline in the error, and not so much exponentially decaying behavior.
You can see a weak form of this by comparing the steepness of the decline in the first few epochs in your two examples, where the first one (with the lower learning rate) takes longer to converge.

最后,如果您的训练和测试错误相距很远(如第一种情况),您可能会问自己是否真的准确地描述或建模了问题;在某些情况下,您可能会意识到(数据)分布中存在一些您可能忽略的问题.不过,由于第二张图要好得多,我怀疑您的问题是否属于这种情况.

Lastly, if your training and test error are very far apart (as in the first case), you might ask yourself whether you are actually accurately describing or modeling the problem; in some instances, you might realize that there is some problem in the (data) distribution that you might have overlooked. Since the second graph is way better, though, I doubt this is the case in your problem.

这篇关于我们如何分析损失与纪元图?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆