Softmax交叉熵损失激增 [英] Softmax Cross Entropy loss explodes

查看:164
本文介绍了Softmax交叉熵损失激增的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在创建一个深度卷积神经网络,用于按像素分类。我正在使用亚当优化器,带有交叉熵的softmax。



  x = tf.placeholder(tf.float32,shape = [None,7168])
y_ = tf.placeholder(tf.float32,shape = [None,7168,3])

#许多卷积和Relus省略了

final = tf.reshape(final,[-1,7168])
keep_prob = tf.placeholder(tf.float32)
W_final = weight_variable([7168,7168,3])
b_final = bias_variable([7168,3])
final_conv = tf.tensordot(final,W_final,axes = [[1],[1]] )+ b_final

cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels = y_,logits = final_conv))
train_step = tf.train.AdamOptimizer(1e-5).minimize (交叉熵)
correct_prediction = tf.equal(tf.argmax(final_conv,2),tf.argmax(y_,2))
精度= tf.reduce_mean(tf.cast(correct_prediction,tf.float32 ))


解决方案

不确定,它到底是什么引起的。我有几次相同的问题。通常有一些帮助:您可能会降低学习速度,即。 Adam的学习速率的界限(例如1e-5到1e-7左右)或尝试随机梯度下降。亚当试图估算可能导致不稳定培训的学习率:请参阅 Adam优化程序在进行20万次批量处理后会陷入麻烦,培训损失增加


一旦我也删除了batchnorm并获得了实际帮助,但这是特别的。为笔划数据(=点序列)设计的网络,对于Conv1d图层来说,它并不是很深。


I am creating a deep convolutional neural network for pixel-wise classification. I am using adam optimizer, softmax with cross entropy.

Github Repository

I asked a similar question found

x = tf.placeholder(tf.float32, shape=[None, 7168])
y_ = tf.placeholder(tf.float32, shape=[None, 7168, 3])

#Many Convolutions and Relus omitted

final = tf.reshape(final, [-1, 7168])
keep_prob = tf.placeholder(tf.float32)
W_final = weight_variable([7168,7168,3])
b_final = bias_variable([7168,3])
final_conv = tf.tensordot(final, W_final, axes=[[1], [1]]) + b_final

cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=final_conv))
train_step = tf.train.AdamOptimizer(1e-5).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(final_conv, 2), tf.argmax(y_, 2))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

解决方案

Not sure, what it causes it exactly. I had the same issue a few times. A few things generally help: You might reduce the learning rate, ie. the bound of the learning rate for Adam (eg. 1e-5 to 1e-7 or so) or try stochastic gradient descent. Adam tries to estimate learning rates which can lead to instable training: See Adam optimizer goes haywire after 200k batches, training loss grows

Once I also removed batchnorm and that actually helped, but this was for a "specially" designed network for stroke data (= point sequences), which was not very deep with Conv1d layers.

这篇关于Softmax交叉熵损失激增的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆