添加批次归一化会降低性能 [英] Adding batch normalization decreases the performance

查看:78
本文介绍了添加批次归一化会降低性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用PyTorch来实现用于基于骨骼的动作识别的分类网络.该模型由三个卷积层和两个完全连接的层组成.这个基本模型在NTU-RGB + D数据集中为我提供了大约70%的精度.我想了解有关批处理规范化的更多信息,所以我为除最后一层之外的所有层添加了批处理规范化.令我惊讶的是,评估准确性下降到60%而不是增加,但是训练准确性从80%增加到90%.谁能说我在做什么错?或添加批次归一化无需增加准确性?

I'm using PyTorch to implement a classification network for skeleton-based action recognition. The model consists of three convolutional layers and two fully connected layers. This base model gave me an accuracy of around 70% in the NTU-RGB+D dataset. I wanted to learn more about batch normalization, so I added a batch normalization for all the layers except for the last one. To my surprise, the evaluation accuracy dropped to 60% rather than increasing But the training accuracy has increased from 80% to 90%. Can anyone say what am I doing wrong? or Adding batch normalization need not increase the accuracy?

具有批量归一化的模型

class BaseModelV0p2(nn.Module):

    def __init__(self, num_person, num_joint, num_class, num_coords):
        super().__init__()
        self.name = 'BaseModelV0p2'
        self.num_person = num_person
        self.num_joint = num_joint
        self.num_class = num_class
        self.channels = num_coords
        self.out_channel = [32, 64, 128]
        self.loss = loss
        self.metric = metric
        self.bn_momentum = 0.01

        self.bn_cv1 = nn.BatchNorm2d(self.out_channel[0], momentum=self.bn_momentum)
        self.conv1 = nn.Sequential(nn.Conv2d(in_channels=self.channels, out_channels=self.out_channel[0],
                                             kernel_size=3, stride=1, padding=1),
                                   self.bn_cv1,
                                    nn.ReLU(),
                                    nn.MaxPool2d(kernel_size=2, stride=2))

        self.bn_cv2 = nn.BatchNorm2d(self.out_channel[1], momentum=self.bn_momentum)
        self.conv2 = nn.Sequential(nn.Conv2d(in_channels=self.out_channel[0], out_channels=self.out_channel[1],
                                            kernel_size=3, stride=1, padding=1),
                                   self.bn_cv2,
                                nn.ReLU(),
                                nn.MaxPool2d(kernel_size=2, stride=2))

        self.bn_cv3 = nn.BatchNorm2d(self.out_channel[2], momentum=self.bn_momentum)
        self.conv3 = nn.Sequential(nn.Conv2d(in_channels=self.out_channel[1], out_channels=self.out_channel[2],
                                            kernel_size=3, stride=1, padding=1),
                                   self.bn_cv3,
                                  nn.ReLU(),
                                  nn.MaxPool2d(kernel_size=2, stride=2))

        self.bn_fc1 = nn.BatchNorm1d(256 * 2, momentum=self.bn_momentum)
        self.fc1 = nn.Sequential(nn.Linear(self.out_channel[2]*8*3, 256*2),
                                 self.bn_fc1,
                                 nn.ReLU(),
                                 nn.Dropout2d(p=0.5))  # TO check

        self.fc2 = nn.Sequential(nn.Linear(256*2, self.num_class))

    def forward(self, input):
        list_bn_layers = [self.bn_fc1, self.bn_cv3, self.bn_cv2, self.bn_cv1]
        # set the momentum  of the batch norm layers to given momentum value during trianing and 0 during evaluation
        # ref: https://discuss.pytorch.org/t/model-eval-gives-incorrect-loss-for-model-with-batchnorm-layers/7561
        # ref: https://github.com/pytorch/pytorch/issues/4741
        for bn_layer in list_bn_layers:
            if self.training:
                bn_layer.momentum = self.bn_momentum
            else:
                bn_layer.momentum = 0

        logits = []
        for i in range(self.num_person):
            out = self.conv1(input[:, :, :, :, i])

            out = self.conv2(out)

            out = self.conv3(out)

            logits.append(out)

        out = torch.max(logits[0], logits[1])
        out = out.view(out.size(0), -1)
        out = self.fc1(out)
        out = self.fc2(out)

        t = out

        assert not ((t != t).any())  # find out nan in tensor
        assert not (t.abs().sum() == 0)  # find out 0 tensor

        return out

推荐答案

我对所观察到的现象的解释是,要减少协方差平移(这是批归一化的目的),而是要增加它.换句话说,不是减小训练与测试之间的分布差异,而是增加它,这就是使您在训练与测试之间的精度上产生更大差异的原因.批量归一化不能始终确保更好的性能,但是对于某些问题,它不能很好地工作.我有几个想法可能会有所改善:

My interpretation of the phenomenon you are observing,, is that instead of reducing the covariance shift, which is what the Batch Normalization is meant for, you are increasing it. In other words, instead of decrease the distribution differences between train and test, you are increasing it and that's what it is causing you to have a bigger difference in the accuracies between train and test. Batch Normalization does not assure better performance always, but for some problems it doesn't work well. I have several ideas that could lead to an improvement:

  • 增大批量大小(如果小),这将有助于在Batch Norm层中计算出的均值和std成为总体参数的更可靠估计.
  • 稍微降低 bn_momentum 参数,以查看是否也可以稳定Batch Norm参数.
  • 我不确定测试时是否应将 bn_momentum 设置为零,我想您应该在训练和调用时调用 model.train()要使用训练有素的模型进行推理时,请使用model.eval().
  • 您也可以尝试使用层归一化"代替批次归一化",因为它不需要累加任何统计信息并且通常效果很好
  • 尝试使用Dropout规范化您的模型
  • 请确保您在每个时期都将训练集洗牌了.不改组数据集可能导致相关的批次,从而使批次统计归一化周期中产生统计信息.这可能会影响您的概括我希望这些想法对您有用
  • Increase the batch size if it is small, what would help the mean and std calculated in the Batch Norm layers to be more robust estimates of the population parameters.
  • Decrease the bn_momentum parameter a bit, to see if that also stabilizes the Batch Norm parameters.
  • I am not sure you should set bn_momentum to zero when test, I think you should just call model.train() when you want to train and model.eval() when you want to use your trained model to perform inference.
  • You could alternatively try Layer Normalization instead of Batch Normalization, cause it does not require accumulating any statistic and usually works well
  • Try regularizing a bit your model using dropout
  • Make sure you shuffle your training set in every epoch. Not shuffling the data set may lead to correlated batches that make the statistics in batch normalization cycle. That may impact your generalization I hope any of these ideas work for you

这篇关于添加批次归一化会降低性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆