TensorFlow CNN:为什么验证损失与开始时有很大不同,并且一直在增加? [英] TensorFlow CNN: Why validation loss are significantly different from the start and has been increasing?

查看:204
本文介绍了TensorFlow CNN:为什么验证损失与开始时有很大不同,并且一直在增加?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是用于十类图片的分类模型。我的代码有三个文件,一个是CNN模型convNet.py,一个是read_TFRecord.py以读取数据,一个是train.py以训练和评估模型。训练样本集为80,000,验证样本集为20,000。

This is a classification model for ten categories of pictures. My code has three files, one is the CNN model convNet.py, one is read_TFRecord.py to read data, one is train.py to train and evaluation model. Training set of samples of 80,000, validation set of sample of 20,000.

问题:

在第一个时期:

训练损失= 2.11,火车准确度= 25.61%

training loss = 2.11, train accuracy = 25.61%

验证损失= 3.05,验证准确度= 8.29 %

validation loss = 3.05, validation accuracy = 8.29%

为什么验证损失从一开始就大不相同?为何验证准确性始终低于10%?

Why validation loss are significantly different right from the start? And why the validation accuracy is always below 10%?

在训练的10个时期:

训练过程始终处于正常学习状态。验证损失在缓慢增加,验证准确性已被震惊约10%。过度拟合吗?但是我已经采取了一些措施,例如增加定期亏损,超支。我不知道问题出在哪里。希望您能对我有所帮助。

The training process is always in normal learning. The validation loss in the slow increase, the validation accuracy has been shock in about 10%. Is it over-fitting? But I have taken some measures, such as adding regularized losses, droupout. I do not know where the problem is. I hope you can help me.

convNet.py:

convNet.py:

def convNet(features, mode):
    input_layer = tf.reshape(features, [-1, 100, 100, 3])
    tf.summary.image('input', input_layer)

    # conv1
    with tf.name_scope('conv1'):
         conv1 = tf.layers.conv2d(
             inputs=input_layer,
             filters=32,
             kernel_size=5,
             padding="same",
             activation=tf.nn.relu,
             kernel_initializer=tf.truncated_normal_initializer(stddev=0.01),
             name='conv1'
         )
         conv1_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, 'conv1')
         tf.summary.histogram('kernel', conv1_vars[0])
         tf.summary.histogram('bias', conv1_vars[1])
         tf.summary.histogram('act', conv1)

    # pool1  100->50
    pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2, name='pool1')

    # dropout
    pool1_dropout = tf.layers.dropout(
    inputs=pool1, rate=0.5, training=tf.equal(mode, learn.ModeKeys.TRAIN), name='pool1_dropout')

    # conv2
    with tf.name_scope('conv2'):
         conv2 = tf.layers.conv2d(
             inputs=pool1_dropout,
             filters=64,
             kernel_size=5,
             padding="same",
             activation=tf.nn.relu,
             kernel_initializer=tf.truncated_normal_initializer(stddev=0.01),
             name='conv2'
         )
         conv2_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, 'conv2')
         tf.summary.histogram('kernel', conv2_vars[0])
         tf.summary.histogram('bias', conv2_vars[1])
         tf.summary.histogram('act', conv2)

    # pool2  50->25
    pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[2, 2], strides=2, name='pool2')

    # dropout
    pool2_dropout = tf.layers.dropout(
    inputs=pool2, rate=0.5, training=tf.equal(mode, learn.ModeKeys.TRAIN), name='pool2_dropout')

    # conv3
    with tf.name_scope('conv3'):
         conv3 = tf.layers.conv2d(
             inputs=pool2_dropout,
             filters=128,
             kernel_size=3,
             padding="same",
             activation=tf.nn.relu,
             kernel_initializer=tf.truncated_normal_initializer(stddev=0.01),
             name='conv3'
         )
         conv3_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, 'conv3')
         tf.summary.histogram('kernel', conv3_vars[0])
         tf.summary.histogram('bias', conv3_vars[1])
         tf.summary.histogram('act', conv3)

    # pool3  25->12
    pool3 = tf.layers.max_pooling2d(inputs=conv3, pool_size=[2, 2], strides=2, name='pool3')

    # dropout
    pool3_dropout = tf.layers.dropout(
    inputs=pool3, rate=0.5, training=tf.equal(mode, learn.ModeKeys.TRAIN), name='pool3_dropout')

    # conv4
    with tf.name_scope('conv4'):
         conv4 = tf.layers.conv2d(
             inputs=pool3_dropout,
             filters=128,
             kernel_size=3,
             padding="same",
             activation=tf.nn.relu,
             kernel_initializer=tf.truncated_normal_initializer(stddev=0.01),
             name='conv4'
         )
         conv4_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, 'conv4')
         tf.summary.histogram('kernel', conv4_vars[0])
         tf.summary.histogram('bias', conv4_vars[1])
         tf.summary.histogram('act', conv4)

    # pool4  12->6
    pool4 = tf.layers.max_pooling2d(inputs=conv4, pool_size=[2, 2], strides=2, name='pool4')

    # dropout
    pool4_dropout = tf.layers.dropout(
    inputs=pool4, rate=0.5, training=tf.equal(mode, learn.ModeKeys.TRAIN), name='pool4_dropout')

    pool4_flat = tf.reshape(pool4_dropout, [-1, 6 * 6 * 128])

    # fc1
    with tf.name_scope('fc1'):
         fc1 = tf.layers.dense(inputs=pool4_flat, units=1024, activation=tf.nn.relu,
                          kernel_initializer=tf.truncated_normal_initializer(stddev=0.01),
                          kernel_regularizer=tf.contrib.layers.l2_regularizer(0.01),
                          name='fc1')
         fc1_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, 'fc1')
         tf.summary.histogram('kernel', fc1_vars[0])
         tf.summary.histogram('bias', fc1_vars[1])
         tf.summary.histogram('act', fc1)

    # dropout
    fc1_dropout = tf.layers.dropout(
    inputs=fc1, rate=0.3, training=tf.equal(mode, learn.ModeKeys.TRAIN), name='fc1_dropout')

    # fc2
    with tf.name_scope('fc2'):
         fc2 = tf.layers.dense(inputs=fc1_dropout, units=512, activation=tf.nn.relu,
                          kernel_initializer=tf.truncated_normal_initializer(stddev=0.01),
                          kernel_regularizer=tf.contrib.layers.l2_regularizer(0.01),
                          name='fc2')
         fc2_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, 'fc2')
         tf.summary.histogram('kernel', fc2_vars[0])
         tf.summary.histogram('bias', fc2_vars[1])
         tf.summary.histogram('act', fc2)

    # dropout
    fc2_dropout = tf.layers.dropout(
    inputs=fc2, rate=0.3, training=tf.equal(mode, learn.ModeKeys.TRAIN), name='fc2_dropout')

    # logits
    with tf.name_scope('out'):
         logits = tf.layers.dense(inputs=fc2_dropout, units=10, activation=None,
                             kernel_initializer=tf.truncated_normal_initializer(stddev=0.01),
                             kernel_regularizer=tf.contrib.layers.l2_regularizer(0.01),
                             name='out')
         out_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, 'out')
         tf.summary.histogram('kernel', out_vars[0])
         tf.summary.histogram('bias', out_vars[1])
         tf.summary.histogram('act', logits)

     return logits

read_TFRecord.py:

read_TFRecord.py:

def read_and_decode(filename, width, height, channel):
filename_queue = tf.train.string_input_producer([filename])
reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue)
features = tf.parse_single_example(serialized_example,
                                   features={
                                       'label': tf.FixedLenFeature([], tf.int64),
                                       'img_raw': tf.FixedLenFeature([], tf.string),
                                   })
img = tf.decode_raw(features['img_raw'], tf.uint8)
img = tf.reshape(img, [width, height, channel])
img = tf.cast(img, tf.float16) * (1. / 255) - 0.5
label = tf.cast(features['label'], tf.int16)
return img, label

train.py:

# step 1
TRAIN_TFRECORD = 'F:/10-image-set2/train.tfrecords'  # train data set
VAL_TFRECORD = 'F:/10-image-set2/val.tfrecords'  # validation data set
WIDTH = 100  # image width
HEIGHT = 100  # image height
CHANNEL = 3  # image channel
TRAIN_BATCH_SIZE = 64
VAL_BATCH_SIZE = 16
train_img, train_label = read_and_decode(TRAIN_TFRECORD, WIDTH, HEIGHT, 
                         CHANNEL)
val_img, val_label = read_and_decode(VAL_TFRECORD, WIDTH, HEIGHT, CHANNEL)
x_train_batch, y_train_batch = tf.train.shuffle_batch([train_img, 
                               train_label], batch_size=TRAIN_BATCH_SIZE, 
                               capacity=80000,min_after_dequeue=79999, 
                               num_threads=64,name='train_shuffle_batch')
x_val_batch, y_val_batch = tf.train.shuffle_batch([val_img, val_label],
                           batch_size=VAL_BATCH_SIZE, 
                           capacity=20000,min_after_dequeue=19999, 
                           num_threads=64, name='val_shuffle_batch')

# step 2
x = tf.placeholder(tf.float32, shape=[None, WIDTH, HEIGHT, CHANNEL], 
                   name='x')
y_ = tf.placeholder(tf.int32, shape=[None, ], name='y_')
mode = tf.placeholder(tf.string, name='mode')
step = tf.get_variable(shape=(), dtype=tf.int32,     initializer=tf.zeros_initializer(), name='step')
tf.add_to_collection(tf.GraphKeys.GLOBAL_STEP, step)
logits = convNet(x, mode) 
with tf.name_scope('Reg_losses'):
     reg_losses = tf.cond(tf.equal(mode, learn.ModeKeys.TRAIN),
                     lambda: tf.add_n(tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)),
                     lambda: tf.constant(0, dtype=tf.float32))
with tf.name_scope('Loss'):
     loss = tf.losses.sparse_softmax_cross_entropy(labels=y_, logits=logits) + reg_losses
train_op = tf.train.AdamOptimizer().minimize(loss, step)
correct_prediction = tf.equal(tf.cast(tf.argmax(logits, 1), tf.int32), y_)
with tf.name_scope('Accuracy'):
     acc = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# step 3
tf.summary.scalar("reg_losses", reg_losses)
tf.summary.scalar("loss", loss)
tf.summary.scalar("accuracy", acc)
merged = tf.summary.merge_all()

# step 4
with tf.Session() as sess:
     summary_dir = './logs/summary/'

     sess.run(tf.global_variables_initializer())
     saver = tf.train.Saver()   
     saver = tf.train.Saver(max_to_keep=1)

     train_writer = tf.summary.FileWriter(summary_dir + 'train',
                                     sess.graph)
     valid_writer = tf.summary.FileWriter(summary_dir + 'valid')

     coord = tf.train.Coordinator()
     threads = tf.train.start_queue_runners(sess=sess, coord=coord) 
     max_acc = 0
     MAX_EPOCH = 10
     for epoch in range(MAX_EPOCH):
         # training
         train_step = int(80000 / TRAIN_BATCH_SIZE)
         train_loss, train_acc = 0, 0
         for step in range(epoch * train_step, (epoch + 1) * train_step):
             x_train, y_train = sess.run([x_train_batch, y_train_batch])
             train_summary, _, err, ac = sess.run([merged, train_op, loss, acc],
                                             feed_dict={x: x_train, y_: y_train,
                                                        mode: learn.ModeKeys.TRAIN,
                                                        global_step: step})
            train_loss += err
            train_acc += ac
            if (step + 1) % 50 == 0:
                train_writer.add_summary(train_summary, step)
         print("Epoch %d,train loss= %.2f,train accuracy=%.2f%%" % (
          epoch, (train_loss / train_step), (train_acc / train_step * 100.0)))

         # validation
         val_step = int(20000 / VAL_BATCH_SIZE)
         val_loss, val_acc = 0, 0
         for step in range(epoch * val_step, (epoch + 1) * val_step):
             x_val, y_val = sess.run([x_val_batch, y_val_batch])
             val_summary, err, ac = sess.run([merged, loss, acc],
                                        feed_dict={x: x_val, y_: y_val, mode: learn.ModeKeys.EVAL,
                                                   global_step: step})
             val_loss += err
             val_acc += ac
             if (step + 1) % 50 == 0:
                 valid_writer.add_summary(val_summary, step)
         print(
           "Epoch %d,validation loss= %.2f,validation accuracy=%.2f%%" % (
            epoch, (val_loss / val_step), (val_acc / val_step * 100.0)))

         # save model
         if val_acc > max_acc:
             max_acc = val_acc
             saver.save(sess, summary_dir + '/10-image.ckpt', epoch)
             print("model saved")
coord.request_stop()
coord.join(threads)

Tensorboard结果:

Tensorboard result:

(橙色是火车。蓝色是验证。)

(Orange is train.Blue is validation.)

精度损失-reg_losses-conv1-conv2-conv3-conv4-fc1-fc2-输出

我的数据:

train-val

推荐答案

我怀疑这是一个过拟合的问题-损失从一开始就大不相同,并且进一步分散在您完成第一个时期之前(约500批)。第一步,我建议您同时可视化训练和评估输入数据,以确保问题不存在。最初,您在10类分类问题上的得分大大低于10%,这表明您几乎肯定在这里没有错。

I doubt this is an issue of overfitting - the losses are significantly different right from the start and diverge further well before you get through your first epoch (~500 batches). Without seeing your dataset its difficult to say more, though as a first step I'd encourage you to visualize both training and evaluation input data to make sure the issue isn't something there. The fact that you get significantly less than 10% on a 10-class classification problem initially indicates you almost certainly don't have something wrong here.

话虽如此,您

Dropout:可能会遇到使用该模型进行过度拟合的问题,因为尽管您可能会认为,但您并未使用辍学或正则化。 mode == learning.ModeKeys 如果mode是张量,则为false,因此您从未使用过dropout。您可以使用 tf.equals(mode,learning.ModeKeys),但我认为通过培训 bool张量到convNet并输入适当的值。

Dropout: mode == learn.ModeKeys is false if mode is a tensor, so you're not using dropout ever. You could use tf.equals(mode, learn.ModeKeys), but I think you'd be much better off passing a training bool tensor to your convNet and feeding in the appropriate value.

正则化:您正在创建正则化损失项,并将它们添加到<$中c $ c> tf.GraphKeys.REGULARIZATION_LOSSES 集合,但是您要最小化的损失不会使用它们。添加以下内容:

Regularization: you're creating the regularization loss terms and they're being added to the tf.GraphKeys.REGULARIZATION_LOSSES collection, but the loss you're minimizing doesn't use them. Add the following:

loss += tf.add_n(tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES))

在优化之前。

优化注意事项步骤:您不应该像往常那样为会话运行提供价值。每次运行优化操作时,它都会更新创建时传递给 step 的值,因此只需使用一个int变量创建它,然后将其保留即可。请参阅以下示例代码:

A note on the optimization step: you shouldn't be feeding in a value to the session run like you are. Every time you run to optimization operation it will update the value passed to step when you created it, so just create it with an int variable and leave it alone. See the following example code:

import tensorflow as tf

x = tf.get_variable(shape=(4, 3), dtype=tf.float32,
                    initializer=tf.random_normal_initializer(), name='x')
loss = tf.nn.l2_loss(x)
step = tf.get_variable(shape=(), dtype=tf.int32,
                       initializer=tf.zeros_initializer(), name='step')
tf.add_to_collection(tf.GraphKeys.GLOBAL_STEP, step)  # good practice

opt = tf.train.AdamOptimizer().minimize(loss, step)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    step_val = sess.run(step)
    print(step_val)  # 0
    sess.run(opt)
    step_val = sess.run(step)
    print(step_val)  # 1

这篇关于TensorFlow CNN:为什么验证损失与开始时有很大不同,并且一直在增加?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆