减少(与延迟)神经网络中的过度拟合 [英] Reducing (Versus Delaying) Overfitting in Neural Network

查看:26
本文介绍了减少(与延迟)神经网络中的过度拟合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在神经网络中,正则化(例如 L2、dropout)通常用于减少过拟合.例如,下图显示了典型的损失与 epoch,有和没有 dropout.实线 = 训练,虚线 = 验证,蓝色 = 基线(无 dropout),橙色 = 有 dropout.绘图由 Tensorflow 教程提供.权重正则化的行为类似.

正则化延迟了验证损失开始增加的时期,但正则化显然不会减少验证损失的最小值(至少在我的模型和上图取自的教程中)).

如果我们在验证损失最小时使用提前停止来停止训练(以避免过度拟合),并且如果正则化只是延迟最小验证损失点(相对于降低最小验证损失值),那么似乎不会导致正则化在泛化能力更强但只会减慢训练速度的网络中.

如何使用正则化来减少最小验证损失(以改善模型泛化)而不是仅仅延迟它?如果正则化只是延迟最小验证损失而不是减少它,那为什么要使用它?

解决方案

从单个教程情节过度概括可能不是一个好主意;这是

显然,如果 dropout 的效果是延迟收敛,那么它就没有多大用处.但当然它并不总是(正如你的情节清楚表明的那样),因此它不应该默认使用(这可以说是这里的教训)...

In neural nets, regularization (e.g. L2, dropout) is commonly used to reduce overfitting. For example, the plot below shows typical loss vs epoch, with and without dropout. Solid lines = Train, dashed = Validation, blue = baseline (no dropout), orange = with dropout. Plot courtesy of Tensorflow tutorials. Weight regularization behaves similarly.

Regularization delays the epoch at which validation loss starts to increase, but regularization apparently does not decrease the minimum value of validation loss (at least in my models and the tutorial from which the above plot is taken).

If we use early stopping to stop training when validation loss is minimum (to avoid overfitting) and if regularization is only delaying the minimum validation loss point (vs. decreasing the minimum validation loss value) then it seems that regularization does not result in a network with greater generalization but rather just slows down training.

How can regularization be used to reduce the minimum validation loss (to improve model generalization) as opposed to just delaying it? If regularization is only delaying minimum validation loss and not reducing it, then why use it?

解决方案

Over-generalizing from a single tutorial plot is arguably not a good idea; here is a relevant plot from the original dropout paper:

Clearly, if the effect of dropout was to delay convergence it would not be of much use. But of course it does not work always (as your plot clearly suggests), hence it should not be used by default (which is arguably the lesson here)...

这篇关于减少(与延迟)神经网络中的过度拟合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆