Softmax交叉熵损失激增 [英] Softmax Cross Entropy loss explodes
问题描述
x = tf.placeholder(tf.float32,shape = [None,7168])
y_ = tf.placeholder(tf.float32,shape = [None,7168,3])
#许多卷积和Relus省略了
final = tf.reshape(final,[-1,7168])
keep_prob = tf.placeholder(tf.float32)
W_final = weight_variable([7168,7168,3])
b_final = bias_variable([7168,3])
final_conv = tf.tensordot(final,W_final,axes = [[1],[1]] )+ b_final
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels = y_,logits = final_conv))
train_step = tf.train.AdamOptimizer(1e-5).minimize (交叉熵)
correct_prediction = tf.equal(tf.argmax(final_conv,2),tf.argmax(y_,2))
精度= tf.reduce_mean(tf.cast(correct_prediction,tf.float32 ))
不确定,它到底是什么引起的。我有几次相同的问题。通常有一些帮助:您可能会降低学习速度,即。 Adam的学习速率的界限(例如1e-5到1e-7左右)或尝试随机梯度下降。亚当试图估算可能导致不稳定培训的学习率:请参阅 Adam优化程序在进行20万次批量处理后会陷入麻烦,培训损失增加
一旦我也删除了batchnorm并获得了实际帮助,但这是特别的。为笔划数据(=点序列)设计的网络,对于Conv1d图层来说,它并不是很深。
I am creating a deep convolutional neural network for pixel-wise classification. I am using adam optimizer, softmax with cross entropy.
I asked a similar question found
x = tf.placeholder(tf.float32, shape=[None, 7168])
y_ = tf.placeholder(tf.float32, shape=[None, 7168, 3])
#Many Convolutions and Relus omitted
final = tf.reshape(final, [-1, 7168])
keep_prob = tf.placeholder(tf.float32)
W_final = weight_variable([7168,7168,3])
b_final = bias_variable([7168,3])
final_conv = tf.tensordot(final, W_final, axes=[[1], [1]]) + b_final
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=final_conv))
train_step = tf.train.AdamOptimizer(1e-5).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(final_conv, 2), tf.argmax(y_, 2))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
Not sure, what it causes it exactly. I had the same issue a few times. A few things generally help: You might reduce the learning rate, ie. the bound of the learning rate for Adam (eg. 1e-5 to 1e-7 or so) or try stochastic gradient descent. Adam tries to estimate learning rates which can lead to instable training: See Adam optimizer goes haywire after 200k batches, training loss grows
Once I also removed batchnorm and that actually helped, but this was for a "specially" designed network for stroke data (= point sequences), which was not very deep with Conv1d layers.
这篇关于Softmax交叉熵损失激增的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!