tf.layers.batch_normalization大测试错误 [英] tf.layers.batch_normalization large test error

查看:133
本文介绍了tf.layers.batch_normalization大测试错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用批量标准化。我尝试在用于mnist的简单转换网络上使用tf.layers.batch_normalization。

I'm trying to use batch normalization. I tried to use tf.layers.batch_normalization on a simple conv net for mnist.

我的训练步长精度较高(> 98%),但测试精度非常低( < 50%)。我尝试更改动量值(尝试使用0.8、0.9、0.99、0.999)并使用批处理大小,但是它的行为基本上始终相同。我训练了20k次迭代。

I get high accuracy for train step (>98%) but very low test accuracy (<50%). I tried to change momentum values (I tried 0.8,0.9,0.99,0.999) and to play to with batch sizes but it always behaves basically the same way. I train it on 20k iterations.

我的代码

# Input placeholders
x = tf.placeholder(tf.float32, [None, 784], name='x-input')
y_ = tf.placeholder(tf.float32, [None, 10], name='y-input')
is_training = tf.placeholder(tf.bool)

# inut layer
input_layer = tf.reshape(x, [-1, 28, 28, 1])
with tf.name_scope('conv1'):
    #Convlution #1 ([5,5] : [28x28x1]->[28x28x6])
    conv1 = tf.layers.conv2d(
        inputs=input_layer,
        filters=6,
        kernel_size=[5, 5],
        padding="same",
        activation=None
    )   

    #Batch Norm #1
    conv1_bn = tf.layers.batch_normalization(
        inputs=conv1,
        axis=-1,
        momentum=0.9,
        epsilon=0.001,
        center=True,
        scale=True,
        training = is_training,
        name='conv1_bn'
    )

    #apply relu
    conv1_bn_relu = tf.nn.relu(conv1_bn)
    #apply pool ([2,2] : [28x28x6]->[14X14X6])
    maxpool1=tf.layers.max_pooling2d(
        inputs=conv1_bn_relu,
        pool_size=[2,2],
        strides=2,
        padding="valid"
    )

with tf.name_scope('conv2'):
    #convolution #2 ([5x5] : [14x14x6]->[14x14x16]
    conv2 = tf.layers.conv2d(
        inputs=maxpool1,
        filters=16,
        kernel_size=[5, 5],
        padding="same",
        activation=None
    )   

    #Batch Norm #2
    conv2_bn = tf.layers.batch_normalization(
        inputs=conv2,
        axis=-1,
        momentum=0.999,
        epsilon=0.001,
        center=True,
        scale=True,
        training = is_training
    )

    #apply relu
    conv2_bn_relu = tf.nn.relu(conv2_bn)
    #maxpool2 ([2,2] : [14x14x16]->[7x7x16]
    maxpool2=tf.layers.max_pooling2d(
        inputs=conv2_bn_relu,
        pool_size=[2,2],
        strides=2,
        padding="valid"
    )

#fully connected 1 [7*7*16 = 784 -> 120]
maxpool2_flat=tf.reshape(maxpool2,[-1,7*7*16])
fc1 = tf.layers.dense(
    inputs=maxpool2_flat,
    units=120,
    activation=None
)

#Batch Norm #2
fc1_bn = tf.layers.batch_normalization(
    inputs=fc1,
    axis=-1,
    momentum=0.999,
    epsilon=0.001,
    center=True,
    scale=True,
    training = is_training
)
#apply reliu

fc1_bn_relu = tf.nn.relu(fc1_bn)

#fully connected 2 [120-> 84]
fc2 = tf.layers.dense(
    inputs=fc1_bn_relu,
    units=84,
    activation=None
)

#apply relu
fc2_bn_relu = tf.nn.relu(fc2)

#fully connected 3 [84->10]. Output layer with softmax
y = tf.layers.dense(
    inputs=fc2_bn_relu,
    units=10,
    activation=None
)

#loss
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))
tf.summary.scalar('cross entropy', cross_entropy)

correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
tf.summary.scalar('accuracy',accuracy)

#merge summaries and init train writer
sess = tf.Session()
merged = tf.summary.merge_all()
train_writer = tf.summary.FileWriter(log_dir + '/train' ,sess.graph)
test_writer = tf.summary.FileWriter(log_dir + '/test') 
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
init = tf.global_variables_initializer()
sess.run(init)

with sess.as_default():
    def get_variables_values():
        variables = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES)
        values = {}
        for variable in variables:
            values[variable.name[:-2]] = sess.run(variable, feed_dict={
                x:batch[0], y_:batch[1], is_training:True
                })
        return values


    for i in range(t_iter):
        batch = mnist.train.next_batch(batch_size)
        if i%100 == 0: #test-set summary
            print('####################################')
            values = get_variables_values()
            print('moving variance is:')
            print(values["conv1_bn/moving_variance"])
            print('moving mean is:')
            print(values["conv1_bn/moving_mean"])
            print('gamma is:')
            print(values["conv1_bn/gamma/Adam"])
            print('beta is:')
            print(values["conv1_bn/beta/Adam"])
            summary, acc = sess.run([merged,accuracy], feed_dict={
                x:mnist.test.images, y_:mnist.test.labels, is_training:False

            })

        else:
            summary, _ = sess.run([merged,train_step], feed_dict={
                x:batch[0], y_:batch[1], is_training:True
            })
            if i%10 == 0:
                train_writer.add_summary(summary,i)

我认为问题在于,Moving_mean / var没有被更新。
我在运行期间打印了Moving_mean / var,得到:
移动方差是:
[1. 1. 1. 1. 1. 1.]
移动均值是:
[0. 0. 0. 0. 0. 0.]
伽玛为:
[-0.00055969 0.00164391 0.00163301 -0.00206227 -0.00011434 -0.00070161]
beta为:
[-0.00232835 -0.00040769 0.00114277 -0.0025414 -0.00049697 0.00221556]

I think the problem is that that the moving_mean/var is not being updated. I print the moving_mean/var during the run and I get: moving variance is: [ 1. 1. 1. 1. 1. 1.] moving mean is: [ 0. 0. 0. 0. 0. 0.] gamma is: [-0.00055969 0.00164391 0.00163301 -0.00206227 -0.00011434 -0.00070161] beta is: [-0.00232835 -0.00040769 0.00114277 -0.0025414 -0.00049697 0.00221556]

有人知道我在做什么错吗?

Anyone has any idea what i'm doing wrong?

推荐答案

tf.layers.batch_normalization 添加的用于更新均值和方差的操作不会自动添加为火车操作的依赖性-因此,如果您不做任何额外的操作,它们就永远不会运行。
(不幸的是,文档当前并未提及。我正在提出一个问题。)

The operations which tf.layers.batch_normalization adds to update mean and variance don't automatically get added as dependencies of the train operation - so if you don't do anything extra, they never get run. (Unfortunately, the documentation doesn't currently mention this. I'm opening an issue about it.)

幸运的是,更新操作很容易获得,因为它们已添加到 tf.GraphKeys.UPDATE_OPS 集合中。然后,您可以手动运行额外的操作:

Luckily, the update operations are easy to get at, since they're added to the tf.GraphKeys.UPDATE_OPS collection. Then you can either run the extra operations manually:

extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
sess.run([train_op, extra_update_ops], ...)

或将它们添加为您的训练操作的依赖项,然后像平常一样运行您的训练操作:

Or add them as dependencies of your training operation, and then just run your training operation as normal:

extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(extra_update_ops):
    train_op = optimizer.minimize(loss)
...
sess.run([train_op], ...)

这篇关于tf.layers.batch_normalization大测试错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆