Tensorflow模型的准确性没有增加 [英] Tensorflow model accuracy not increasing

查看:65
本文介绍了Tensorflow模型的准确性没有增加的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在上关于Udacity的深度学习课程,并且目前正在尝试完成第4个作业,在该作业中您应该创建自己的模型,并查看在noMINST数据集上可以实现的最佳精度.

I am currently doing the Deep Learning course on Udacity and am presently trying to complete the 4th assignment, where you are supposed to create your own model and see what the best accuracy you can achieve on the noMINST dataset.

我尝试实现VGG 16模型,但遇到了一些问题,最初,损失直接变成了nan,因此我将最后一个激活函数从relu更改为Sigmoid,但现在的准确性不高改善并停留在0-6%左右,因此我猜我的实现是错误的,但似乎看不到错误,我将不胜感激任何帮助或建议!

I have tried to implement the VGG 16 model but have been running into a few problems, initially, the loss was going straight to nan, and so I changed the last activation function from relu to sigmoid, but now the accuracy does not improve and is stuck on around 0-6% so I'm guessing my implementation is wrong but I can't seem to see the mistake, I would greatly appreciate any help or advice!

除了我从数据集中读取数据外,贝勒是我的完整代码,因为我猜这是对的.

Bellow is my full code other than reading in the dataset as this code was provided by so I'm guessing it's right.

pickle_file = 'notMNIST.pickle'

with open(pickle_file, 'rb') as f:
    save = pickle.load(f)
    train_dataset = save['train_dataset']
    train_labels = save['train_labels']
    valid_dataset = save['valid_dataset']
    valid_labels = save['valid_labels']
    test_dataset = save['test_dataset']
    test_labels = save['test_labels']
    del save  # hint to help gc free up memory
    print('Training set', train_dataset.shape, train_labels.shape)
    print('Validation set', valid_dataset.shape, valid_labels.shape)
    print('Test set', test_dataset.shape, test_labels.shape)

image_size = 28
num_labels = 10
num_channels = 1  # grayscale

import numpy as np


def reformat(dataset, labels):
    dataset = dataset.reshape(
        (-1, image_size, image_size, num_channels)).astype(np.float32)
    labels = (np.arange(num_labels) == labels[:, None]).astype(np.float32)
    return dataset, labels


train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)


def accuracy(predictions, labels):
    return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))
            / predictions.shape[0])


batch_size = 16
patch_size = 5
depth = 16
num_hidden = 64

graph = tf.Graph()

with graph.as_default():
    # Input data.
    tf_train_dataset = tf.placeholder(
        tf.float32, shape=(batch_size, image_size, image_size, num_channels))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)

    # Variables.
    l1_w = tf.Variable(tf.truncated_normal([3, 3, 1, 64], stddev=0.1))
    l1_b = tf.Variable(tf.zeros([64]))
    l2_w = tf.Variable(tf.truncated_normal([3, 3, 64, 64], stddev=0.1))
    l2_b = tf.Variable(tf.zeros([64]))

    l3_w = tf.Variable(tf.truncated_normal([3, 3, 64, 128], stddev=0.1))
    l3_b = tf.Variable(tf.zeros([128]))
    l4_w = tf.Variable(tf.truncated_normal([3, 3, 128, 128], stddev=0.1))
    l4_b = tf.Variable(tf.zeros([128]))

    l5_w = tf.Variable(tf.truncated_normal([3, 3, 128, 256], stddev=0.1))
    l5_b = tf.Variable(tf.zeros([256]))
    l6_w = tf.Variable(tf.truncated_normal([3, 3, 256, 256], stddev=0.1))
    l6_b = tf.Variable(tf.zeros([256]))
    l7_w = tf.Variable(tf.truncated_normal([3, 3, 256, 256], stddev=0.1))
    l7_b = tf.Variable(tf.zeros([256]))

    l8_w = tf.Variable(tf.truncated_normal([3, 3, 256, 512], stddev=0.1))
    l8_b = tf.Variable(tf.zeros([512]))
    l9_w = tf.Variable(tf.truncated_normal([3, 3, 512, 512], stddev=0.1))
    l9_b = tf.Variable(tf.zeros([512]))
    l10_w = tf.Variable(tf.truncated_normal([3, 3, 512, 512], stddev=0.1))
    l10_b = tf.Variable(tf.zeros([512]))

    l11_w = tf.Variable(tf.truncated_normal([3, 3, 512, 512], stddev=0.1))
    l11_b = tf.Variable(tf.zeros([512]))
    l12_w = tf.Variable(tf.truncated_normal([3, 3, 512, 512], stddev=0.1))
    l12_b = tf.Variable(tf.zeros([512]))
    l13_w = tf.Variable(tf.truncated_normal([3, 3, 512, 512], stddev=0.1))
    l13_b = tf.Variable(tf.zeros([512]))

    l14_w = tf.Variable(tf.truncated_normal([512, num_hidden], stddev=0.1))
    l14_b = tf.Variable(tf.constant(1.0, shape=[num_hidden]))

    l15_w = tf.Variable(tf.truncated_normal([num_hidden, num_labels], stddev=0.1))
    l15_b = tf.Variable(tf.constant(1.0, shape=[num_labels]))


    # Model.
    def model(data):
        conv_1 = tf.nn.relu(tf.nn.conv2d(data, l1_w, [1, 1, 1, 1], padding='SAME') + l1_b)
        conv_1 = tf.nn.relu(tf.nn.conv2d(conv_1, l2_w, [1, 1, 1, 1], padding='SAME') + l2_b)
        max_pool_1 = tf.nn.max_pool(conv_1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

        conv_2 = tf.nn.relu(tf.nn.conv2d(max_pool_1, l3_w, [1, 1, 1, 1], padding='SAME') + l3_b)
        conv_2 = tf.nn.relu(tf.nn.conv2d(conv_2, l4_w, [1, 1, 1, 1], padding='SAME') + l4_b)
        max_pool_2 = tf.nn.max_pool(conv_2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

        conv_3 = tf.nn.relu(tf.nn.conv2d(max_pool_2, l5_w, [1, 1, 1, 1], padding='SAME') + l5_b)
        conv_3 = tf.nn.relu(tf.nn.conv2d(conv_3, l6_w, [1, 1, 1, 1], padding='SAME') + l6_b)
        conv_3 = tf.nn.relu(tf.nn.conv2d(conv_3, l7_w, [1, 1, 1, 1], padding='SAME') + l7_b)
        max_pool_3 = tf.nn.max_pool(conv_3, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

        conv_4 = tf.nn.relu(tf.nn.conv2d(max_pool_3, l8_w, [1, 1, 1, 1], padding='SAME') + l8_b)
        conv_4 = tf.nn.relu(tf.nn.conv2d(conv_4, l9_w, [1, 1, 1, 1], padding='SAME') + l9_b)
        conv_4 = tf.nn.relu(tf.nn.conv2d(conv_4, l10_w, [1, 1, 1, 1], padding='SAME') + l10_b)
        max_pool_4 = tf.nn.max_pool(conv_4, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

        conv_5 = tf.nn.relu(tf.nn.conv2d(max_pool_4, l11_w, [1, 1, 1, 1], padding='SAME') + l11_b)
        conv_5 = tf.nn.sigmoid(tf.nn.conv2d(conv_5, l12_w, [1, 1, 1, 1], padding='SAME') + l12_b)
        conv_5 = tf.nn.sigmoid(tf.nn.conv2d(conv_5, l13_w, [1, 1, 1, 1], padding='SAME') + l13_b)
        max_pool_5 = tf.nn.max_pool(conv_5, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

        shape = max_pool_5.get_shape().as_list()
        reshape = tf.reshape(max_pool_5, [shape[0], shape[1] * shape[2] * shape[3]])
        hidden = tf.nn.sigmoid(tf.matmul(reshape, l14_w) + l14_b)
        return tf.matmul(hidden, l15_w) + l15_b


    logits = model(tf_train_dataset)
    loss = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits))

    # Optimizer.
    optimizer = tf.train.AdamOptimizer(0.001).minimize(loss)

    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(logits)
    valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
    test_prediction = tf.nn.softmax(model(tf_test_dataset))

num_steps = 1001

with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    print('Initialized')
    for step in range(num_steps):
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        feed_dict = {tf_train_dataset: batch_data, tf_train_labels: batch_labels}
        _, l, predictions = session.run(
            [optimizer, loss, train_prediction], feed_dict=feed_dict)
        if step % 50 == 0:
            print('Minibatch loss at step %d: %f' % (step, l))
            print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
            print('Validation accuracy: %.1f%%' % accuracy(
                valid_prediction.eval(), valid_labels))
print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

推荐答案

我同意 @cyniikal ,您的网络似乎对于此数据集来说太复杂了.使用单层模型,我可以在训练数据上达到93.75%的精度,在测试数据上可以达到86.7%的精度.

I agree with @cyniikal, your network seems too complex for this dataset. With a single layer model, I was able to achieve 93.75% accuracy on the training data and 86.7% accuracy on the test data.

在我的模型中,我使用了 GradientDescentOptimizer ,它与您一样将 cross_entropy 最小化.我还使用了 16 批量大小.

In my model, I used GradientDescentOptimizer that minimized cross_entropy just as you did. I also used a size 16 batch-size.

您看到的与我的主要区别在于:

The main difference I see between your approach and mine is that I:

  1. OneHot编码标签
  2. 使用单层网络而不是VGG-16

请参阅此带有我的单层模型代码示例的笔记本.

如果您想在神经网络中添加图层(网络将收敛,将遇到更多困难),我强烈建议您阅读此页面以解决消失梯度.

If you would like to add layers to your neural network (the network will converge with more difficulties), I highly recommend reading this article on neural nets. Specifically, since you added sigmoid as your last activation function, I believe you are suffering from a vanishing gradient problem. See this page to address the vanishing gradient.

这篇关于Tensorflow模型的准确性没有增加的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆