添加批量标准化会降低性能 [英] Adding batch normalization decreases the performance

查看:35
本文介绍了添加批量标准化会降低性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 PyTorch 来实现基于骨架的动作识别的分类网络.该模型由三个卷积层和两个全连接层组成.这个基本模型在 NTU-RGB+D 数据集中给了我大约 70% 的准确率.我想了解更多关于批量归一化的知识,所以我为除最后一层之外的所有层添加了一个批量归一化.令我惊讶的是,评估准确率下降到了 60% 而不是增加,但训练准确率却从 80% 增加到了 90%.谁能说我做错了什么?或者添加批量标准化不需要提高准确性?

I'm using PyTorch to implement a classification network for skeleton-based action recognition. The model consists of three convolutional layers and two fully connected layers. This base model gave me an accuracy of around 70% in the NTU-RGB+D dataset. I wanted to learn more about batch normalization, so I added a batch normalization for all the layers except for the last one. To my surprise, the evaluation accuracy dropped to 60% rather than increasing But the training accuracy has increased from 80% to 90%. Can anyone say what am I doing wrong? or Adding batch normalization need not increase the accuracy?

批量归一化模型

class BaseModelV0p2(nn.Module):

    def __init__(self, num_person, num_joint, num_class, num_coords):
        super().__init__()
        self.name = 'BaseModelV0p2'
        self.num_person = num_person
        self.num_joint = num_joint
        self.num_class = num_class
        self.channels = num_coords
        self.out_channel = [32, 64, 128]
        self.loss = loss
        self.metric = metric
        self.bn_momentum = 0.01

        self.bn_cv1 = nn.BatchNorm2d(self.out_channel[0], momentum=self.bn_momentum)
        self.conv1 = nn.Sequential(nn.Conv2d(in_channels=self.channels, out_channels=self.out_channel[0],
                                             kernel_size=3, stride=1, padding=1),
                                   self.bn_cv1,
                                    nn.ReLU(),
                                    nn.MaxPool2d(kernel_size=2, stride=2))

        self.bn_cv2 = nn.BatchNorm2d(self.out_channel[1], momentum=self.bn_momentum)
        self.conv2 = nn.Sequential(nn.Conv2d(in_channels=self.out_channel[0], out_channels=self.out_channel[1],
                                            kernel_size=3, stride=1, padding=1),
                                   self.bn_cv2,
                                nn.ReLU(),
                                nn.MaxPool2d(kernel_size=2, stride=2))

        self.bn_cv3 = nn.BatchNorm2d(self.out_channel[2], momentum=self.bn_momentum)
        self.conv3 = nn.Sequential(nn.Conv2d(in_channels=self.out_channel[1], out_channels=self.out_channel[2],
                                            kernel_size=3, stride=1, padding=1),
                                   self.bn_cv3,
                                  nn.ReLU(),
                                  nn.MaxPool2d(kernel_size=2, stride=2))

        self.bn_fc1 = nn.BatchNorm1d(256 * 2, momentum=self.bn_momentum)
        self.fc1 = nn.Sequential(nn.Linear(self.out_channel[2]*8*3, 256*2),
                                 self.bn_fc1,
                                 nn.ReLU(),
                                 nn.Dropout2d(p=0.5))  # TO check

        self.fc2 = nn.Sequential(nn.Linear(256*2, self.num_class))

    def forward(self, input):
        list_bn_layers = [self.bn_fc1, self.bn_cv3, self.bn_cv2, self.bn_cv1]
        # set the momentum  of the batch norm layers to given momentum value during trianing and 0 during evaluation
        # ref: https://discuss.pytorch.org/t/model-eval-gives-incorrect-loss-for-model-with-batchnorm-layers/7561
        # ref: https://github.com/pytorch/pytorch/issues/4741
        for bn_layer in list_bn_layers:
            if self.training:
                bn_layer.momentum = self.bn_momentum
            else:
                bn_layer.momentum = 0

        logits = []
        for i in range(self.num_person):
            out = self.conv1(input[:, :, :, :, i])

            out = self.conv2(out)

            out = self.conv3(out)

            logits.append(out)

        out = torch.max(logits[0], logits[1])
        out = out.view(out.size(0), -1)
        out = self.fc1(out)
        out = self.fc2(out)

        t = out

        assert not ((t != t).any())  # find out nan in tensor
        assert not (t.abs().sum() == 0)  # find out 0 tensor

        return out

推荐答案

我对你所观察到的现象的解释是,不是减少协方差偏移,这就是批量标准化的目的,而是增加它.换句话说,不是减少训练和测试之间的分布差异,而是增加它,这就是导致训练和测试之间的准确度差异更大的原因.Batch Normalization 并不能始终保证更好的性能,但对于某些问题,它不能很好地工作.我有几个可以改进的想法:

My interpretation of the phenomenon you are observing,, is that instead of reducing the covariance shift, which is what the Batch Normalization is meant for, you are increasing it. In other words, instead of decrease the distribution differences between train and test, you are increasing it and that's what it is causing you to have a bigger difference in the accuracies between train and test. Batch Normalization does not assure better performance always, but for some problems it doesn't work well. I have several ideas that could lead to an improvement:

  • 如果批次较小,则增加批次大小,这将有助于在批次范数层中计算的均值和标准差更可靠地估计总体参数.
  • 稍微减小 bn_momentum 参数,看看这是否也能稳定 Batch Norm 参数.
  • 我不确定您是否应该在测试时将 bn_momentum 设置为零,我认为您应该在要训练时调用 model.train()model.eval() 当您想使用经过训练的模型进行推理时.
  • 您也可以尝试使用 Layer Normalization 而不是 Batch Normalization,因为它不需要累积任何统计数据并且通常效果很好
  • 尝试使用 dropout 对模型进行一些正则化
  • 确保在每个时期都对训练集进行洗牌.不打乱数据集可能会导致相关批次在批次标准化周期中进行统计.这可能会影响你的概括我希望这些想法中的任何一个对您有用
  • Increase the batch size if it is small, what would help the mean and std calculated in the Batch Norm layers to be more robust estimates of the population parameters.
  • Decrease the bn_momentum parameter a bit, to see if that also stabilizes the Batch Norm parameters.
  • I am not sure you should set bn_momentum to zero when test, I think you should just call model.train() when you want to train and model.eval() when you want to use your trained model to perform inference.
  • You could alternatively try Layer Normalization instead of Batch Normalization, cause it does not require accumulating any statistic and usually works well
  • Try regularizing a bit your model using dropout
  • Make sure you shuffle your training set in every epoch. Not shuffling the data set may lead to correlated batches that make the statistics in batch normalization cycle. That may impact your generalization I hope any of these ideas work for you

这篇关于添加批量标准化会降低性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆