批量大小== 1的Tensorflow和批量归一化输出全零 [英] Tensorflow and Batch Normalization with Batch Size==1 => Outputs all zeros

本文介绍了批量大小== 1的Tensorflow和批量归一化输出全零的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个关于BatchNorm(稍后将介绍BN)的理解的问题.

I have a question about the understanding of the BatchNorm (BN later on).

我的convnet工作得很好,我正在编写测试以检查形状和输出范围.而且我注意到,当我将batch_size设置为1时,我的模型输出零(登录和激活).

I have a convnet working nicely, I was writing tests to check for shape and outputs range. And I noticed that when I set the batch_size = 1, my model outputs zeros (logits and activations).

我用BN制作了最简单的convnet原型:

I prototyped the simplest convnet with BN:

输入=>转化+ ReLU => BN =>转化+ ReLU => BN =>转化层+ Tanh

使用 xavier初始化初始化模型.我猜BN 在培训期间进行了一些需要Batch_size> 1的计算.

The model is initialized with xavier initialization. I guess that BN during training do some calculations that require Batch_size > 1.

我在PyTorch中发现了一个似乎在谈论此问题的问题: https://github.com. com/pytorch/pytorch/issues/1381

I have found an issue in PyTorch that seems to talk about this: https://github.com/pytorch/pytorch/issues/1381

有人可以解释吗?对我来说还是有点模糊.

Could anyone explain this ? It's still a little blurry for me.

示例运行:

重要提示:运行此脚本需要Tensorlayer库: pip install tensorlayer

Important: Tensorlayer Library is required for this script to run: pip install tensorlayer

import tensorflow as tf
import tensorlayer as tl

import numpy as np

def conv_net(inputs, is_training):

    xavier_initilizer = tf.contrib.layers.xavier_initializer(uniform=True)
    normal_initializer = tf.random_normal_initializer(mean=1., stddev=0.02)

    # Input Layers
    network = tl.layers.InputLayer(inputs, name='input')

    fx = [64, 128, 256, 256, 256]

    for i, n_out_channel in enumerate(fx):

        with tf.variable_scope('h' + str(i + 1)):

            network = tl.layers.Conv2d(
                network,
                n_filter    = n_out_channel,
                filter_size = (5, 5),
                strides     = (2, 2),
                padding     = 'VALID',
                act         = tf.identity,
                W_init      = xavier_initilizer,
                name        = 'conv2d'
            )

            network = tl.layers.BatchNormLayer(
                network,
                act        = tf.identity,
                is_train   = is_training,
                gamma_init = normal_initializer,
                name       = 'batch_norm'
            )

            network = tl.layers.PReluLayer(
                layer  = network,
                a_init = tf.constant_initializer(0.2),
                name   ='activation'
            )

    ############# OUTPUT LAYER ###############

    with tf.variable_scope('h' + str(len(fx) + 1)):
        '''

        network = tl.layers.FlattenLayer(network, name='flatten')

        network = tl.layers.DenseLayer(
            network,
            n_units = 100,
            act     = tf.identity,
            W_init  = xavier_initilizer,
            name    = 'dense'
        )

        '''

        output_filter_size = tuple([int(i) for i in network.outputs.get_shape()[1:3]])

        network = tl.layers.Conv2d(
            network,
            n_filter    = 100,
            filter_size = output_filter_size,
            strides     = (1, 1),
            padding     = 'VALID',
            act         = tf.identity,
            W_init      = xavier_initilizer,

            name        = 'conv2d'
        )

        network = tl.layers.BatchNormLayer(
            network,
            act        = tf.identity,
            is_train   = is_training,
            gamma_init = normal_initializer,
            name       = 'batch_norm'
        )

        net_logits = network.outputs

        network.outputs = tf.nn.tanh(
            x        = network.outputs,
            name     = 'activation'
        )

        net_output = network.outputs

    return network, net_output, net_logits


if __name__ == '__main__':

    tf.logging.set_verbosity(tf.logging.DEBUG)

    #################################################
    #                MODEL DEFINITION               #
    #################################################

    PLH_SHAPE = [None, 256, 256, 3]

    input_plh = tf.placeholder(tf.float32, PLH_SHAPE, name='input_placeholder')

    convnet, net_out, net_logits = conv_net(input_plh, is_training=True)


    with tf.Session() as sess:
        tl.layers.initialize_global_variables(sess)

        convnet.print_params(details=True)

        #################################################
        #                  LAUNCH A RUN                 #
        #################################################

        for BATCH_SIZE in [1, 2]:

            INPUT_SHAPE = [BATCH_SIZE, 256, 256, 3]

            batch_data = np.random.random(size=INPUT_SHAPE)

            output, logits = sess.run(
                [net_out, net_logits],
                feed_dict={
                    input_plh: batch_data
                }
            )

            if tf.logging.get_verbosity() == tf.logging.DEBUG:
                print("\n\n###########################")

                print("\nBATCH SIZE = %d\n" % BATCH_SIZE)

            tf.logging.debug("output => Shape: %s - Mean: %e - Std: %f - Min: %f - Max: %f" % (
                output.shape,
                output.mean(),
                output.std(),
                output.min(),
                output.max()
            ))

            tf.logging.debug("logits => Shape: %s - Mean: %e - Std: %f - Min: %f - Max: %f" % (
                logits.shape,
                logits.mean(),
                logits.std(),
                logits.min(),
                logits.max()
            ))

            if tf.logging.get_verbosity() == tf.logging.DEBUG:
                print("###########################")

给出以下输出:

###########################

BATCH SIZE = 1

DEBUG:tensorflow:output => Shape: (1, 1, 1, 100) - Mean: 0.000000e+00 - Std: 0.000000 - Min: 0.000000 - Max: 0.000000
DEBUG:tensorflow:logits => Shape: (1, 1, 1, 100) - Mean: 0.000000e+00 - Std: 0.000000 - Min: 0.000000 - Max: 0.000000
###########################


###########################

BATCH SIZE = 2

DEBUG:tensorflow:output => Shape: (2, 1, 1, 100) - Mean: -1.430511e-08 - Std: 0.760749 - Min: -0.779634 - Max: 0.779634
DEBUG:tensorflow:logits => Shape: (2, 1, 1, 100) - Mean: -4.768372e-08 - Std: 0.998715 - Min: -1.044437 - Max: 1.044437
###########################

推荐答案

您可能应该阅读有关批标准化的说明,例如 tensorflow的相关文档.

You should probably read an explanation about Batch Normalization, such as this one. You can also take a look at tensorflow's related doc.

基本上,有两种方法可以执行batch_norm,并且在处理批量大小为1时都存在问题:

Basically, there are 2 ways you can do batch_norm, and both have problems dealing with batch size of 1:

  • 使用每个像素的移动平均值和方差像素,因此它们是与批次中每个样本相同形状的张量.这是@layog答案中使用的一种,(我认为)是原始论文中使用的一种,并且最常用的.

  • using a moving mean and variance pixel per pixel, so they are tensors of the same shape as each sample in your batch. This is the one used in @layog's answer, and (I think) in the original paper, and the most used.

在整个图像/特征空间上使用移动平均值和方差,因此它们只是形状为(n_channels,)的向量(等级1).

Using a moving mean and variance over the entire image / feature space, so they are just vectors (rank 1) of shape (n_channels,).

在两种情况下,您都将拥有:

In both cases, you'll have:

output = gamma * (input - mean) / sigma + beta

"Beta"通常设置为0,而"gamma"设置为1,因为您在BN之后具有线性函数.

Beta is often set to 0 and gamma to 1, since you have linear functions right after BN.

在训练期间variance在当前批次中计算,当其大小为1时会引起问题:

During training, mean and variance are computed accross the current batch, which causes problem when it is of size 1:

  • 在第一种情况下,您会得到mean=input,所以output=0
  • 在第二种情况下,mean将是所有像素的平均值,因此更好.但是如果您的宽度和高度也都是1,那么您会再次获得mean=input,因此您会得到output=0.
  • in the 1st case, you'll get mean=input, so output=0
  • in the 2nd case, mean will be the average value over all pixels, so it's better; but if your width and height are also 1, then you get mean=input again, so you get output=0.

我认为大多数人(和原始方法)都使用第一种方法,这就是为什么您会得到0的原因(尽管TF文档似乎也建议使用第二种方法).您提供的链接中的参数似乎正在考虑第二种方法.

I think most people (and the original method) use the 1st way, which is why you'll get 0 (although TF doc seems to suggest that the 2nd method is usual too). The argument in the link you're providing seems to be considering the 2nd method.

在任何情况下(无论使用哪种方式),使用BN时,只有使用较大的批处理大小(例如至少10个),您才能获得良好的结果.

In any case (whichever you're using), with BN you'll only get good results if you use a bigger batch size (say, at least 10).

这篇关于批量大小== 1的Tensorflow和批量归一化输出全零的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆