卷积神经网络-辍学会降低性能 [英] Convolutional Neural Network - Dropout kills performance

查看:142
本文介绍了卷积神经网络-辍学会降低性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为了识别字母,我正在使用Tensorflow构建卷积神经网络(两者都是我的新手).我在dropout层有一个非常奇怪的行为:如果我不放它(即keep_proba为1),它表现得很好并且可以学习(请参见下面的Tensorboard屏幕截图,其中包括蓝色和测试方面的准确性和损失)橙色).

I'm building a convolutional neural network using Tensorflow (I'm new with both), in order to recognize letters. I've got a very weird behavior with the dropout layer : if I don't put it (ie. keep_proba at 1), it performs quite well and learns (see Tensorboard screenshots of accuracy and loss below, with training in blue and testing in orange).

但是,当我在训练阶段放入辍学层(我在0.8和0.5尝试)时,网络什么也没学到:损失很快降到3或4左右并且不再移动(我还注意到我的网络总是不管输入图像是什么,都可以预测相同的值).相同的图形:

However, when I put the dropout layer during the training phase (I tried at 0.8 and 0.5), the network learns nothing : loss falls quickly around 3 or 4 and doesn't move anymore (I also noticed that my network always predicts the same values, regardless to the input image). Same graphs :

这种奇怪行为的原因可能是什么?我读过,为了避免过度拟合,辍学是一件好事.我使用错了吗?

What could be the causes of this weird behavior ? I've read that dropout is a good thing to use to avoid overfitting. Am I using it wrong ?

如果有用,这是我的网络体系结构: CONVOLUTION -> MAX_POOL -> RELU -> CONVOLUTION -> MAX_POOL -> RELU -> FC 1024 neurons -> DROPOUT -> OUTPUT LAYER.

Here's my network architecture if useful : CONVOLUTION -> MAX_POOL -> RELU -> CONVOLUTION -> MAX_POOL -> RELU -> FC 1024 neurons -> DROPOUT -> OUTPUT LAYER.

非常感谢您的帮助或想法.

Thanks a lot for any help or idea.

推荐答案

Dropout是一种正则化技术,通过根据丢失率在训练时概率性地删除神经元,使网络更可靠地拟合数据.它可以成为缓解神经网络中数据过度拟合的强大工具.

Dropout is a regularization technique that causes networks to fit data more robustly by probabilistically removing neurons at train time based on the dropout rate. It can be a powerful tool for mitigating over-fitting of data in a neural network.

对于使用多少"辍学正则化确实没有严格的规定.有了正确的训练数据,您可能根本不需要任何辍学,而在其他情况下,如果缺少该辍学,则会导致严重的过拟合病态.就您而言,看来辍学率可能高达50%或80%(通常,过度正规化会导致拟合不足).

There's really no hard-and-fast rules for "how much" dropout regularization to use. With the right training data, you might not need any dropout at all, while in other cases its absence will result in serious overfitting pathologies. In your case, it appears that 50% or 80% dropout rates may be excessive (over-regularization in general leads to under-fitting).

过度拟合病理的典型指标是训练和测试集之间的差异(通常,两者都会改善一段时间,但是训练误差将继续下降,而测试误差开始朝相反的方向发展) .虽然您的训练误差明显小于测试误差,但在训练期间测试误差永远不会恶化(这无疑是过度拟合的指标).仍然有一些机会可以以一些适中的辍学来折衷一些训练误差,以获得更好的样本外预测误差(通常是最终目标).知道这一点的唯一方法是使用更适中的辍学率进行测试(我会从20%开始,然后对该值进行测试)以查看训练错误是否有所改善(如果没有改善,则可以进一步降低辍学率) ).在最好的情况下,您的样本外测试错误会变得更好,但代价是训练误差会有所增加(或训练误差的收敛速度会变慢).但是,如果您对规则进行了过度调整,则会看到两者均退化(在第二组曲线图中很明显).

The typical indicator of over-fitting pathology is divergence between train and test sets (often, both will improve for a while, but then the training error will continue to go down while the test error starts to go in the opposite direction). While your training error is clearly less than your test error, the test error doesn't ever deteriorate over the training period (which would be an unambiguous indicator of overfitting). There may still be some opportunity to trade off some training error for better out-of-sample prediction error (which is typically the ultimate goal) with a modest amount of dropout. The only way to know this is to test with more modest dropout rates (I'd start with something like 20% and test around that value) to see if training error improves (if it does not, you can reduce the dropout rate even further). In the best case, your out-of-sample test error will get better at the expense of some increase of training error (or slower convergence in the training error). If you are over-regularized though, you'll see degradation of both (which is pretty clearly evident in the second set of plots).

正如其他人指出的那样,您可能会发现辍学正则化在卷积层中更有效(或者,如果不尝试,很难说).模型结构和所有超参数设置的空间太大,无法有效搜索,因此没有太多理论方法可以指导我们的选择.通常,最好从已证明可以有效解决类似问题(基于已发布的结果)的配方开始,然后从那里进行测试和试验.

As others noted, you may find that dropout regularization is more effective in the convolutional layers (or not, it's hard to say without trying it). The space of model structure and all hyperparameter settings is far too big to be able to search effectively, and there's not much in the way of theory to guide our choices. In general, it's best to start with recipes that have been demonstrated to work effectively on similar problems (based on published results) and to test and experiment from there.

能够有效地使用神经网络与学习从训练测试指标中识别这些动态有很大关系,这将使您能够根据模型结构或超参数(包括辍学率)的变化来识别改进.

Being able to work effectively with neural networks has a lot to do with learning to recognize these dynamics from the train-test metrics which will let you recognize improvement based on changes to model structure or hyperparameters (including dropout rates).

这篇关于卷积神经网络-辍学会降低性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆