Tensorflow-损失增加到NaN [英] Tensorflow - Loss increases to NaN

查看:156
本文介绍了Tensorflow-损失增加到NaN的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要参加Udacity的深度学习课程.我观察到的有趣的事情是,对于相同的数据集,我的1层神经网络工作得很好,但是当我添加更多层时,NaN的损失会增加.

I am going though Udacity's Deep Learning Course. The interesting thing that I am observing is that for same dataset, my 1 layer Neural Network works perfectly fine, but when I add more layers my Loss increases to NaN.

我使用以下博客文章作为参考:我使用以下博客文章作为参考:

I am using following blog post as reference: I am using the following blog post as a reference: http://www.ritchieng.com/machine-learning/deep-learning/tensorflow/regularization/

这是我的代码:

batch_size = 128
beta = 1e-3

# Network Parameters
n_hidden_1 = 1024 # 1st layer number of neurons
n_hidden_2 = 512 # 2nd layer number of neurons

graph = tf.Graph()
with graph.as_default():
    # Input data. For the training data, we use a placeholder that will be fed
    # at run time with a training minibatch.
    tf_train_dataset = tf.placeholder(tf.float32,
                                  shape=(batch_size, image_size * image_size))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)


    # Variables.
    w1 = tf.Variable(tf.truncated_normal([image_size * image_size, n_hidden_1]))
    w2 = tf.Variable(tf.truncated_normal([n_hidden_1, n_hidden_2],stddev=math.sqrt(2.0/n_hidden_1)))
    w3 = tf.Variable(tf.truncated_normal([n_hidden_2, num_labels],stddev=math.sqrt(2.0/n_hidden_2)))

    b1 = tf.Variable(tf.zeros([n_hidden_1]))
    b2 = tf.Variable(tf.zeros([n_hidden_2]))
    b3 = tf.Variable(tf.zeros([num_labels]))

    # Learning rate decay configs
    global_step = tf.Variable(0, trainable=False)
    starter_learning_rate = 0.5

    # Training computation.
    logits_1 = tf.matmul(tf_train_dataset, w1) + b1
    hidden_layer_1 = tf.nn.relu(logits_1)
    layer_1_dropout = tf.nn.dropout(hidden_layer_1, keep_prob)

    logits_2 = tf.matmul(layer_1_dropout, w2) + b2
    hidden_layer_2 = tf.nn.relu(logits_2)
    layer_2_dropout = tf.nn.dropout(hidden_layer_2, keep_prob)

    # the output logits
    logits_3 = tf.matmul(layer_2_dropout, w3) + b3


    # Normal Loss
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits_3, labels=tf_train_labels))

    learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step, 10000, 0.96)
    # Optimizer.
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)

num_steps = 3001

with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    for step in range(num_steps):

    // some logic to get training data batches

    feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
_, l, predictions = session.run(
  [optimizer, loss, train_prediction], feed_dict=feed_dict)

  print("Minibatch loss at step %d: %f" % (step, l))

打印损失后,我看到NaN呈指数增长:

After printing the Loss I see that is increases kind of exponentially to NaN:

Minibatch loss at step 1: 7474.770508 Minibatch loss at step 2: 43229.820312 Minibatch loss at step 3: 50132.988281 Minibatch loss at step 4: 10196093.000000 Minibatch loss at step 5: 3162884096.000000 Minibatch loss at step 6: 25022026481664.000000 Minibatch loss at step 7: 651425419900819079168.000000 Minibatch loss at step 8: 21374465836947504345731163114962944.000000 Minibatch loss at step 9: nan Minibatch loss at step 10: nan

Minibatch loss at step 1: 7474.770508 Minibatch loss at step 2: 43229.820312 Minibatch loss at step 3: 50132.988281 Minibatch loss at step 4: 10196093.000000 Minibatch loss at step 5: 3162884096.000000 Minibatch loss at step 6: 25022026481664.000000 Minibatch loss at step 7: 651425419900819079168.000000 Minibatch loss at step 8: 21374465836947504345731163114962944.000000 Minibatch loss at step 9: nan Minibatch loss at step 10: nan

我的代码几乎与之相似,但我仍然得到NaN.

My code is nearly similar to it, but still I am getting NaN.

有什么建议可以解决我在这里做错的事情吗?

Any suggestions to what I might have done wrong here?

推荐答案

这是因为Relu激活功能导致爆炸梯度.因此,您需要相应地降低学习率(在您的情况下为 starter_learning_rate ).此外,您还可以尝试使用其他激活功能.

This is because the Relu activation function causes the exploding gradient. Therefore you need to reduce the learning rate accordingly (in your case its the starter_learning_rate). Moreover, you can try a different activation function also.

在这里,(在简单的多层FFNN中,只有ReLU激活功能无法收敛),这与您的情况类似.按照答案,您会明白的.

Here, (In simple multi-layer FFNN only ReLU activation function doesn't converge) is a similar problem as your case. Follow the answer and you will understand.

希望这会有所帮助.

这篇关于Tensorflow-损失增加到NaN的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆