具有紧密连接的层的辍学 [英] Dropout with densely connected layer

查看:69
本文介绍了具有紧密连接的层的辍学的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我为我的一个项目使用了密集网络模型,并且在使用正则化方面遇到了一些困难.

Iam using a densenet model for one of my projects and have some difficulties using regularization.

没有任何正则化,验证和训练损失(MSE)都会减少.但是,训练损失下降得更快,导致最终模型有些过拟合.

Without any regularization, both validation and training loss (MSE) decrease. The training loss drops faster though, resulting in some overfitting of the final model.

因此,我决定使用辍学以避免过度拟合.使用Dropout时,验证和训练损失在第一个时期都减少到约0.13,并在约10个时期保持不变.

So I decided to use dropout to avoid overfitting. When using Dropout, both validation and training loss decrease to about 0.13 during the first epoch and remain constant for about 10 epochs.

此后,两个损失函数都以没有丢失的相同方式减少,从而导致再次拟合.最终的损耗值与不掉线的范围大致相同.

After that both loss functions decrease in the same way as without dropout, resulting in overfitting again. The final loss value is in about the same range as without dropout.

所以对我来说,辍学似乎并没有真正发挥作用.

So for me it seems like dropout is not really working.

如果我切换到L2正则化,我可以避免过拟合,但我宁愿使用Dropout作为正则化器.

If I switch to L2 regularization though, Iam able to avoid overfitting, but I would rather use Dropout as a regularizer.

现在Iam想知道是否有人经历过这种行为?

Now Iam wondering if anyone has experienced that kind of behaviour?

我在密集块(瓶颈层)和过渡块(辍学率= 0.5)中都使用了辍学:

I use dropout in both the dense block (bottleneck layer) and in the transition block (dropout rate = 0.5):

def bottleneck_layer(self, x, scope):
    with tf.name_scope(scope):
        x = Batch_Normalization(x, training=self.training, scope=scope+'_batch1')
        x = Relu(x)
        x = conv_layer(x, filter=4 * self.filters, kernel=[1,1], layer_name=scope+'_conv1')
        x = Drop_out(x, rate=dropout_rate, training=self.training)

        x = Batch_Normalization(x, training=self.training, scope=scope+'_batch2')
        x = Relu(x)
        x = conv_layer(x, filter=self.filters, kernel=[3,3], layer_name=scope+'_conv2')
        x = Drop_out(x, rate=dropout_rate, training=self.training)

        return x

def transition_layer(self, x, scope):
    with tf.name_scope(scope):
        x = Batch_Normalization(x, training=self.training, scope=scope+'_batch1')
        x = Relu(x)
        x = conv_layer(x, filter=self.filters, kernel=[1,1], layer_name=scope+'_conv1')
        x = Drop_out(x, rate=dropout_rate, training=self.training)
        x = Average_pooling(x, pool_size=[2,2], stride=2)

        return x

推荐答案

没有任何正则化,验证和训练损失(MSE)都会减少.不过,训练损失下降得更快,导致最终模型有些过拟合.

Without any regularization, both validation and training loss (MSE) decrease. The training loss drops faster though, resulting in some overfitting of the final model.

这是不是过度拟合.

当验证损失开始增加而训练损失继续减少时,过度拟合开始;这是它的告示签名:

Overfitting starts when your validation loss starts increasing, while your training loss continues decreasing; here is its telltale signature:

图片改编自关于过拟合的维基百科条目-可能存在其他问题水平轴,例如增强树的深度或数量,神经网络拟合迭代的数量等.

The image is adapted from the Wikipedia entry on overfitting - diferent things may lie in the horizontal axis, e.g. depth or number of boosted trees, number of neural net fitting iterations etc.

培训损失和验证损失之间的(通常是预期的)差异是完全不同的,称为

The (generally expected) difference between training and validation loss is something completely different, called the generalization gap:

理解泛化的一个重要概念是泛化差距,即模型在训练数据上的性能与在同一分布中得出的未见数据上的性能之间的差异.

An important concept for understanding generalization is the generalization gap, i.e., the difference between a model’s performance on training data and its performance on unseen data drawn from the same distribution.

实际上,验证数据实际上是看不见的数据.

where, practically speaking, validation data is unseen data indeed.

所以对我来说,辍学似乎并没有真正发挥作用.

So for me it seems like dropout is not really working.

情况很可能是-辍学是不是期望始终运行并解决所有问题.

It can very well be the case - dropout is not expected to work always and for every problem.

这篇关于具有紧密连接的层的辍学的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆