交叉熵损失突然增加到无穷大 [英] Cross entropy loss suddenly increases to infinity

查看:534
本文介绍了交叉熵损失突然增加到无穷大的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从研究论文中复制一个深度卷积神经网络.我已经实现了该体系结构,但是经过10个星期之后,我的交叉熵损失突然增加到无穷大.可以在下面的图表中看到.发生问题后,您可以忽略精度发生了什么变化.

此处是github存储库,其中包含该体系结构的图片

做完一些研究后,我认为使用AdamOptimizer或relu可能是个问题.

x = tf.placeholder(tf.float32, shape=[None, 7168])
y_ = tf.placeholder(tf.float32, shape=[None, 7168, 3])

#Many Convolutions and Relus omitted

final = tf.reshape(final, [-1, 7168])
keep_prob = tf.placeholder(tf.float32)
W_final = weight_variable([7168,7168,3])
b_final = bias_variable([7168,3])
final_conv = tf.tensordot(final, W_final, axes=[[1], [1]]) + b_final

cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=final_conv))
train_step = tf.train.AdamOptimizer(1e-5).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(final_conv, 2), tf.argmax(y_, 2))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

编辑 如果有人感兴趣,解决方案是我基本上输入了不正确的数据.

解决方案

解决方案:控制解决方案空间.这可能意味着在训练时使用较小的数据集,可能意味着使用较少的隐藏节点,可能意味着以不同的方式初始化wb.您的模型已达到损耗未定义的点,这可能是由于未定义梯度或final_conv信号造成的.

为什么:有时无论什么情况,都会达到数值不稳定性.最终添加机器epsilon以防止被零除(此处为交叉熵损失)只会无济于事,因为即使那样,该数字也无法由您使用的精度来准确表示. (请参阅: https://en.wikipedia.org/wiki/Round-off_error https://floating-point-gui.de/basic/)

注意事项:
1)调整epsilons时,请确保与您的数据类型一致(使用您使用的精度的机器epsilon,在这种情况下,float32为1e-6 ref: python numpy机器epsilon .

2)以防万一,其他人对此感到困惑:Adamoptimizer的构造函数中的值是学习率,但是您可以设置epsilon值(参考: https://github.com/vahidk/EffectiveTensorflow #熵

I am attempting to replicate an deep convolution neural network from a research paper. I have implemented the architecture, but after 10 epochs, my cross entropy loss suddenly increases to infinity. This can be seen in the chart below. You can ignore what happens to the accuracy after the problem occurs.

Here is the github repository with a picture of the architecture

After doing some research I think using an AdamOptimizer or relu might be a problem.

x = tf.placeholder(tf.float32, shape=[None, 7168])
y_ = tf.placeholder(tf.float32, shape=[None, 7168, 3])

#Many Convolutions and Relus omitted

final = tf.reshape(final, [-1, 7168])
keep_prob = tf.placeholder(tf.float32)
W_final = weight_variable([7168,7168,3])
b_final = bias_variable([7168,3])
final_conv = tf.tensordot(final, W_final, axes=[[1], [1]]) + b_final

cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=final_conv))
train_step = tf.train.AdamOptimizer(1e-5).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(final_conv, 2), tf.argmax(y_, 2))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

EDIT If anyone is interested, the solution was that I was basically feeding in incorrect data.

解决方案

Solution: Control the solution space. This might mean using smaller datasets when training, it might mean using less hidden nodes, it might mean initializing your wb differently. Your model is reaching a point where the loss is undefined, which might be due to the gradient being undefined, or the final_conv signal.

Why: Sometimes no matter what, a numerical instability is reached. Eventually adding a machine epsilon to prevent dividing by zero (cross entropy loss here) just won't help because even then the number cannot be accurately represented by the precision you are using. (Ref: https://en.wikipedia.org/wiki/Round-off_error and https://floating-point-gui.de/basic/)

Considerations:
1) When tweaking epsilons, be sure to be consistent with your data type (Use the machine epsilon of the precision you are using, in your case float32 is 1e-6 ref: https://en.wikipedia.org/wiki/Machine_epsilon and python numpy machine epsilon.

2) Just in-case others reading this are confused: The value in the constructor for Adamoptimizer is the learning rate, but you can set the epsilon value (ref: How does paramater epsilon affects AdamOptimizer? and https://www.tensorflow.org/api_docs/python/tf/train/AdamOptimizer)

3) Numerical instability of tensorflow is there and its difficult to get around. Yes there is tf.nn.softmax_with_cross_entropy but this is too specific (what if you don't want a softmax?). Refer to Vahid Kazemi's 'Effective Tensorflow' for an insightful explanation: https://github.com/vahidk/EffectiveTensorflow#entropy

这篇关于交叉熵损失突然增加到无穷大的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆