不能在卷积层中同时使用偏差和批量归一化 [英] Can not use both bias and batch normalization in convolution layers

查看:45
本文介绍了不能在卷积层中同时使用偏差和批量归一化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 tensorflow 的超薄框架,因为它很简单.但我想要具有偏差和批量归一化的卷积层.在香草张量流中,我有:

I use slim framework for tensorflow, because of its simplicity. But I want to have convolutional layer with both biases and batch normalization. In vanilla tensorflow, I have:

def conv2d(input_, output_dim, k_h=5, k_w=5, d_h=2, d_w=2, name="conv2d"):
    with tf.variable_scope(name):
        w = tf.get_variable('w', [k_h, k_w, input_.get_shape()[-1], output_dim],

    initializer=tf.contrib.layers.xavier_initializer(uniform=False))
    conv = tf.nn.conv2d(input_, w, strides=[1, d_h, d_w, 1], padding='SAME')

    biases = tf.get_variable('biases', [output_dim], initializer=tf.constant_initializer(0.0))
    conv = tf.reshape(tf.nn.bias_add(conv, biases), conv.get_shape())

    tf.summary.histogram("weights", w)
    tf.summary.histogram("biases", biases)

    return conv

d_bn1 = BatchNorm(name='d_bn1')
h1 = lrelu(d_bn1(conv2d(h0, df_dim + y_dim, name='d_h1_conv')))

我把它改写成slim:

and I rewrote it to slim by this:

h1 = slim.conv2d(h0,
                 num_outputs=self.df_dim + self.y_dim,
                 scope='d_h1_conv',
                 kernel_size=[5, 5],
                 stride=[2, 2],
                 activation_fn=lrelu,
                 normalizer_fn=layers.batch_norm,
                 normalizer_params=batch_norm_params,                           
                 weights_initializer=layers.xavier_initializer(uniform=False),
                 biases_initializer=tf.constant_initializer(0.0)
                 )

但是这段代码没有给conv层添加偏差.那是因为 https:///github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/layers/python/layers/layers.py#L1025在哪里

But this code does not add bias to conv layer. That is because of https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/layers/python/layers/layers.py#L1025 where is

    layer = layer_class(filters=num_outputs,
                    kernel_size=kernel_size,
                    strides=stride,
                    padding=padding,
                    data_format=df,
                    dilation_rate=rate,
                    activation=None,
                    use_bias=not normalizer_fn and biases_initializer,
                    kernel_initializer=weights_initializer,
                    bias_initializer=biases_initializer,
                    kernel_regularizer=weights_regularizer,
                    bias_regularizer=biases_regularizer,
                    activity_regularizer=None,
                    trainable=trainable,
                    name=sc.name,
                    dtype=inputs.dtype.base_dtype,
                    _scope=sc,
                    _reuse=reuse)
    outputs = layer.apply(inputs)

在层的构建中,这导致在使用批量归一化时没有偏差.这是否意味着我不能使用 slim 和 layer 库同时进行偏差和批量归一化?或者有没有另一种方法可以在使用 slim 时在层中同时实现偏置和批量归一化?

in the construction of layer, which results in not having bias when using batch normalization. Does that mean that I can not have both biases and batch normalization using slim and layers library? Or is there another way to achieve having both bias and batch normalization in layer when using slim?

推荐答案

Batchnormalization 已经包含了偏置项的添加.回顾一下 BatchNorm 已经:

Batchnormalization already includes the addition of the bias term. Recap that BatchNorm is already:

gamma * normalized(x) + bias

所以没有必要(而且没有意义)在卷积层中添加另一个偏置项.简单地说,BatchNorm 通过它们的平均值来改变激活.因此,任何常数都将被抵消.

So there is no need (and it makes no sense) to add another bias term in the convolution layer. Simply speaking BatchNorm shifts the activation by their mean values. Hence, any constant will be canceled out.

如果您仍想这样做,则需要删除 normalizer_fn 参数并将 BatchNorm 添加为单个层.就像我说的,这毫无意义.

If you still want to do this, you need to remove the normalizer_fn argument and add BatchNorm as a single layer. Like I said, this makes no sense.

但解决方案类似于

net = slim.conv2d(net, normalizer_fn=None, ...)
net = tf.nn.batch_normalization(net)

注意,BatchNorm 依赖于-梯度更新.因此,您要么需要使用与 UPDATE_OPS 集合兼容的优化器.或者你需要手动添加tf.control_dependencies.

Note, the BatchNorm relies on non-gradient updates. So you either need to use an optimizer which is compatible with the UPDATE_OPS collection. Or you need to manually add tf.control_dependencies.

长话短说:即使您实现了 ConvWithBias+BatchNorm,它的行为也会像 ConvWithoutBias+BatchNorm.就像没有激活函数的多个全连接层将表现得像单个层一样.

Long story short: Even if you implement the ConvWithBias+BatchNorm, it will behave like ConvWithoutBias+BatchNorm. It is the same as multiple fully-connected layers without activation function will behave like a single one.

这篇关于不能在卷积层中同时使用偏差和批量归一化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆