Tensorflow:批次大小>不能过度拟合训练数据1个 [英] Tensorflow: Can't overfit training data with batch size > 1

查看:178
本文介绍了Tensorflow:批次大小>不能过度拟合训练数据1个的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我用Tensorflow编码了一个小型RNN网络,以在给定某些参数的情况下返回总能耗.我的代码似乎有问题.当我使用大于1的批次大小(即使只有4个样本!)时,它也不会过拟合训练数据.在下面的代码中,当我将BatchSize设置为1时,损耗值达到0.但是,通过将BatchSize设置为2,网络将无法过度拟合,损耗值将达到12.500000并永远卡在那里.

I coded a small RNN network with Tensorflow to return the total energy consumption given some parameters. There seem to be a problem in my code. It can't overfit the training data when I use a batch size > 1 (even with only 4 samples!). In the code below, the loss value reaches 0 when I set BatchSize to 1. However, by setting BatchSize to 2, the network fails to overfit and the loss value goes toward 12.500000 and gets stuck there forever.

我怀疑这与LSTM状态有关.如果不每次迭代都更新状态,则会遇到相同的问题.还是成本函数?感谢您的帮助.谢谢.

I suspect this has something to do with LSTM states. I get the same problem if I don't update the state with each iteration. Or maybe the cost function? A help is appreciated. Thanks.

import tensorflow as tf
import numpy as np
import os

from utils import loadData

Epochs = 10000
LearningRate = 0.0001
MaxGradNorm = 5

SeqLen = 1
NChannels = 28
NClasses = 1

NLayers = 2
NUnits = 256

BatchSize = 1
NumSamples = 4
#################################################################

trainingFile = "./training.dat"

X_values, Y_values = loadData(trainingFile, SeqLen, NumSamples)

X = tf.placeholder(tf.float32, [BatchSize, SeqLen, NChannels], name='inputs')

Y = tf.placeholder(tf.float32, [BatchSize, SeqLen, NClasses], name='labels')

keep_prob = tf.placeholder(tf.float32, name='keep')

initializer = tf.contrib.layers.xavier_initializer()

Xin = tf.unstack(tf.transpose(X, perm=[1, 0, 2]))

lstm_layers = []

for i in range(NLayers):

    lstm_layer = tf.nn.rnn_cell.LSTMCell(num_units=NUnits, initializer=initializer, use_peepholes=True, state_is_tuple=True)

    dropout_layer = tf.contrib.rnn.DropoutWrapper(lstm_layer, output_keep_prob=keep_prob)

    #[LSTM ---> DROPOUT] ---> [LSTM ---> DROPOUT] ---> etc...
    lstm_layers.append(dropout_layer)   

rnn = tf.nn.rnn_cell.MultiRNNCell(lstm_layers, state_is_tuple=True)

initial_state = rnn.zero_state(BatchSize, tf.float32)

outputs, final_state = tf.nn.static_rnn(rnn, Xin, dtype=tf.float32, initial_state=initial_state)

outputs = tf.transpose(outputs, [1,0,2])
outputs = tf.reshape(outputs, [-1, NUnits])

weight = tf.Variable(tf.truncated_normal([NUnits, NClasses]))
bias = tf.Variable(tf.constant(0.1, shape=[NClasses]))
prediction = tf.matmul(outputs, weight) + bias
prediction = tf.reshape(prediction, [BatchSize, SeqLen, NClasses])

cost = tf.reduce_sum(tf.pow(tf.subtract(prediction, Y), 2)) / (2 * BatchSize)

tvars = tf.trainable_variables()

grad, _ = tf.clip_by_global_norm(tf.gradients(cost, tvars), MaxGradNorm)

optimizer = tf.train.AdamOptimizer(learning_rate = LearningRate)

train_step = optimizer.apply_gradients(zip(grad, tvars))

sess = tf.Session()

sess.run(tf.global_variables_initializer())

iteration = 1

for e in range(0, Epochs):

    train_loss = []

    state = sess.run(initial_state)

    for i in xrange(0, len(X_values), BatchSize):
        x = X_values[i:i + BatchSize]
        y = Y_values[i:i + BatchSize]

    y = np.expand_dims(y, 2)

        feed = {X : x, Y : y, keep_prob : 1.0, initial_state : state}

        _ , loss, state, pred = sess.run([train_step, cost, final_state, prediction], feed_dict = feed)

        train_loss.append(loss)

        iteration += 1

    print("Epoch: {}/{}".format(e, Epochs), "Iteration: {:d}".format(iteration), "Train average rmse: {:6f}".format(np.mean(train_loss)))

推荐答案

对输入数据进行规范化解决了该问题.

Normalizing the input data solved the problem.

这篇关于Tensorflow:批次大小>不能过度拟合训练数据1个的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆