为什么太多的时期会导致过度拟合? [英] why too many epochs will cause overfitting?

查看:119
本文介绍了为什么太多的时期会导致过度拟合?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在阅读python的深度学习书. 在阅读了第4章,抗击过度拟合之后,我有两个问题.

I am reading the a deep learning with python book. After reading chapter 4, Fighting Overfitting, I have two questions.

  1. 为什么增加时期数会导致过度拟合? 我知道增加纪元数会涉及到更多的梯度下降尝试,这会导致过度拟合吗?

  1. Why might increasing the number of epochs cause overfitting? I know increasing increasing the number of epochs will involve more attempts at gradient descent, will this cause overfitting?

在过度拟合过程中,准确性会降低吗?

During the process of fighting overfitting, will the accuracy be reduced ?

推荐答案

我不确定您正在阅读哪本书,因此在我专门回答问题之前,一些背景信息可能会有所帮助.

I'm not sure which book you are reading, so some background information may help before I answer the questions specifically.

首先,增加时期数并不一定会导致过度拟合,但确实可以.如果学习率和模型参数较小,则可能需要花费很多时间才能导致可测量的过度拟合.也就是说,通常需要进行更多培训.

Firstly, increasing the number of epochs won't necessarily cause overfitting, but it certainly can do. If the learning rate and model parameters are small, it may take many epochs to cause measurable overfitting. That said, it is common for more training to do so.

为使问题始终清晰可见,重要的是要记住,我们最常使用神经网络来构建可用于预测的模型(例如,预测图像是否包含特定对象或变量值将如何变化).下一步).

To keep the question in perspective, it's important to remember that we most commonly use neural networks to build models we can use for prediction (e.g. predicting whether an image contains a particular object or what the value of a variable will be in the next time step).

我们通过迭代地调整权重和偏差来构建模型,以便网络可以充当在输入数据和预测输出之间转换的功能.我们之所以选择这样的模型有很多原因,通常是因为我们只是不知道功能是什么/应该是什么,或者功能太复杂而无法进行分析开发.为了使网络能够对这种复杂的功能进行建模,网络本身必须具有高度的复杂性.尽管这种复杂性非常强大,但却很危险!该模型可能变得如此复杂,以至于它可以非常精确地有效记住训练数据,但随后却无法充当有效的通用功能,无法对训练集之外的数据起作用. IE.它可能会过拟合.

We build the model by iteratively adjusting weights and biases so that the network can act as a function to translate between input data and predicted outputs. We turn to such models for a number of reasons, often because we just don't know what the function is/should be or the function is too complex to develop analytically. In order for the network to be able to model such complex functions, it must be capable of being highly-complex itself. Whilst this complexity is powerful, it is dangerous! The model can become so complex that it can effectively remember the training data very precisely but then fail to act as an effective, general function that works for data outside of the training set. I.e. it can overfit.

您可以认为它有点像某人(该模型),它一次又一次地烘焙水果蛋糕(训练数据)来学习烘焙–很快,他们将能够不使用而烘焙出出色的水果蛋糕食谱(培训),但他们可能无法很好地烘烤海绵蛋糕(看不见的数据).

You can think of it as being a bit like someone (the model) who learns to bake by only baking fruit cake (training data) over and over again – soon they'll be able to bake an excellent fruit cake without using a recipe (training), but they probably won't be able to bake a sponge cake (unseen data) very well.

回到神经网络!由于使用神经网络进行过度拟合的风险很高,因此深度学习工程师可以使用许多工具和技巧来防止过度拟合,例如使用辍学.这些工具和技巧统称为规范化".

Back to neural networks! Because the risk of overfitting is high with a neural network there are many tools and tricks available to the deep learning engineer to prevent overfitting, such as the use of dropout. These tools and tricks are collectively known as 'regularisation'.

这就是为什么我们使用涉及测试数据集的开发和培训策略的原因-我们假装看不见测试数据,并在培训过程中对其进行监视.您可以在下面的图中看到一个示例(图像功劳).经过约50个纪元后,尽管训练误差仍保持在最小值(通常训练误差将继续改善),但随着模型开始记忆训练集",测试误差开始增加.

This is why we use development and training strategies involving test datasets – we pretend that the test data is unseen and monitor it during training. You can see an example of this in the plot below (image credit). After about 50 epochs the test error begins to increase as the model has started to 'memorise the training set', despite the training error remaining at its minimum value (often training error will continue to improve).

因此,回答您的问题:

  1. 允许模型继续训练(即更多的时期)会增加权重和偏差的调整风险,以至于模型在看不见(或测试/验证)的数据上表现不佳.该模型现在只是存储训练集".

  1. Allowing the model to continue training (i.e. more epochs) increases the risk of the weights and biases being tuned to such an extent that the model performs poorly on unseen (or test/validation) data. The model is now just 'memorising the training set'.

连续的纪元可能会大大提高训练的准确性,但这并不一定意味着该模型对新数据的预测将是准确的–通常情况实际上会变得更糟.为了防止这种情况,我们使用测试数据集并在训练过程中监控测试准确性.这使我们可以更明智地决定模型是否对于看不见的数据变得更加准确.

Continued epochs may well increase training accuracy, but this doesn't necessarily mean the model's predictions from new data will be accurate – often it actually gets worse. To prevent this, we use a test data set and monitor the test accuracy during training. This allows us to make a more informed decision on whether the model is becoming more accurate for unseen data.

我们可以使用称为 early stop 的技术,通过这种技术,一旦测试精度在少数时间段内停止提高后,我们就停止训练模型.提前停止可以被认为是另一种正则化技术.

We can use a technique called early stopping, whereby we stop training the model once test accuracy has stopped improving after a small number of epochs. Early stopping can be thought of as another regularisation technique.

这篇关于为什么太多的时期会导致过度拟合?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆