防止CNN卷积层过度拟合 [英] Prevention of overfitting in convolutional layers of a CNN

查看:100
本文介绍了防止CNN卷积层过度拟合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用TensorFlow训练用于手语应用的卷积神经网络(CNN)。 CNN必须对27个不同的标签进行分类,因此毫不奇怪,一个主要问题一直在解决过度拟合问题。我已经采取了一些步骤来完成此任务:

I'm using TensorFlow to train a Convolutional Neural Network (CNN) for a sign language application. The CNN has to classify 27 different labels, so unsurprisingly, a major problem has been addressing overfitting. I've taken several steps to accomplish this:


  1. 我收集了大量高质量的训练数据(每个训练样本超过5000个标签)。

  2. 我已经建立了一个相当复杂的预处理阶段,以帮助最大限度地提高光照条件等不变性。

  3. 我正在使用

  4. 我正在对完全连接的参数应用L2正则化。

  5. 我已经进行了广泛的超参数优化(在给定硬件和时间限制的情况下)以识别最简单的模型,该模型可以使训练数据损失接近0%。

  1. I've collected a large amount of high-quality training data (over 5000 samples per label).
  2. I've built a reasonably sophisticated pre-processing stage to help maximize invariance to things like lighting conditions.
  3. I'm using dropout on the fully-connected layers.
  4. I'm applying L2 regularization to the fully-connected parameters.
  5. I've done extensive hyper-parameter optimization (to the extent possible given HW and time limitations) to identify the simplest model that can achieve close to 0% loss on training data.

不幸的是,即使经过了所有这些步骤,我发现我无法取得比3%的测试错误更好的结果。 (这并不可怕,但要使应用程序可行,我需要对其进行实质性的改进。)

Unfortunately, even after all these steps, I'm finding that I can't achieve much better that about 3% test error. (It's not terrible, but for the application to be viable, I'll need to improve that substantially.)

我怀疑过度拟合的原因在于卷积因为我没有采取任何明确的步骤来规范化(除了使层尽可能小之外)。但是根据TensorFlow提供的示例,似乎没有将正则化或辍学通常应用于卷积层。

I suspect that the source of the overfitting lies in the convolutional layers since I'm not taking any explicit steps there to regularize (besides keeping the layers as small as possible). But based on examples provided with TensorFlow, it doesn't appear that regularization or dropout is typically applied to convolutional layers.

我在网上找到的唯一一种明确处理的方法防止卷积层过度拟合是一种相当新的方法,称为随机池。不幸的是,似乎在TensorFlow中尚无此实现,至少目前还没有。

The only approach I've found online that explicitly deals with prevention of overfitting in convolutional layers is a fairly new approach called Stochastic Pooling. Unfortunately, it appears that there is no implementation for this in TensorFlow, at least not yet.

因此,简而言之,是否存在一种建议的方法来防止卷积层的过拟合TensorFlow可以实现?还是需要创建一个自定义的池运算符来支持随机池方法?

So in short, is there a recommended approach to prevent overfitting in convolutional layers that can be achieved in TensorFlow? Or will it be necessary to create a custom pooling operator to support the Stochastic Pooling approach?

感谢任何指导!

推荐答案

如何克服过度拟合?




  • 获取更多数据(或数据扩充)

  • 退出(请参阅纸张解释退出cnns

  • DropConnect

  • 正则化(请参见我的硕士学位论文,例如第85页)

  • 功能比例尺裁剪

  • 全球平均池化

  • 使网络更小

  • 提前停止

  • How can I fight overfitting?

    • Get more data (or data augmentation)
    • Dropout (see paper, explanation, dropout for cnns)
    • DropConnect
    • Regularization (see my masters thesis, page 85 for examples)
    • Feature scale clipping
    • Global average pooling
    • Make network smaller
    • Early stopping

    • 托马斯,马丁。 卷积神经网络体系结构的分析和优化。 arXiv预印本arXiv:1707.09725(2017)。

      Thoma, Martin. "Analysis and Optimization of Convolutional Neural Network Architectures." arXiv preprint arXiv:1707.09725 (2017).

      有关分析技术,请参见第2.5章。如该章开头所述,通常可以执行以下操作:

      See chapter 2.5 for analysis techniques. As written in the beginning of that chapter, you can usually do the following:


      • (I1)更改问题定义(例如,类

      • (I2)获取更多培训数据

      • (I3)清理培训数据

      • (I4)更改预处理(请参阅附录B.1)

      • (I5)扩大训练数据集(请参见附录B.2)

      • (I6)更改培训设置(请参阅附录B.3至B.5)

      • (I7)更改模型(请参阅附录B.6和B.7)

      • (I1) Change the problem definition (e.g., the classes which are to be distinguished)
      • (I2) Get more training data
      • (I3) Clean the training data
      • (I4) Change the preprocessing (see Appendix B.1)
      • (I5) Augment the training data set (see Appendix B.2)
      • (I6) Change the training setup (see Appendices B.3 to B.5)
      • (I7) Change the model (see Appendices B.6 and B.7)

      CNN必须分类27个不同的标签,不足为奇的是,一个主要问题一直在解决过度拟合问题。

      The CNN has to classify 27 different labels, so unsurprisingly, a major problem has been addressing overfitting.

      我不知道它是如何连接的。您可以拥有数百个标签,而不会出现过度拟合的问题。

      I don't understand how this is connected. You can have hundreds of labels without a problem of overfitting.

      这篇关于防止CNN卷积层过度拟合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆