手动计算交叉熵与在Tensorflow中使用softmax_cross_entropy_with_logits [英] calculating cross entropy manually vs using softmax_cross_entropy_with_logits in tensorflow

查看:77
本文介绍了手动计算交叉熵与在Tensorflow中使用softmax_cross_entropy_with_logits的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了一个问题,我试图使用tensorflow为MNIST数据集创建一个深层的ReLU网络。当我将损失用作内置的 tf.nn.softmax_cross_entropy_with_logits() 时,效果很好,但是手动计算熵项似乎无效。

I'm running into an issue where I'm trying to create a deep ReLU network using tensorflow for the MNIST dataset. It's working fine when I use my loss as the built in tf.nn.softmax_cross_entropy_with_logits(), but calculating the entropy term manually doesn't seem to work.

这是网络的样子:

train_subset = 200
num_features = 784
num_labels = 10
num_units = 200

bias1 = tf.Variable(tf.constant(0.1, shape=[num_units]), name="bias1")
bias2= tf.Variable(tf.constant(0.1, shape=[num_units]), name="bias2")
bias3= tf.Variable(tf.constant(0.1, shape=[num_units]), name="bias3")
bias_out = tf.Variable(tf.constant(0.1, shape=[num_labels]), name="bias_out")

weights1 = tf.Variable(tf.random_normal([num_features, num_units]), name="weights_layer1")
weights2 = tf.Variable(tf.random_normal([num_units, num_units]), name="weights_layer2")
weights3 = tf.Variable(tf.random_normal([num_units, num_units]), name="weights_layer3")
weights_out = tf.Variable(tf.random_normal([num_units, num_labels]), name="weights_out")

# The deep ReLU network
h_relu1 = tf.nn.relu(tf.add(tf.matmul(x, weights1), bias1))
h_relu2 = tf.nn.relu(tf.add(tf.matmul(h_relu1, weights2), bias2))
h_relu3 = tf.nn.relu(tf.add(tf.matmul(h_relu2, weights3), bias3))
logits = tf.matmul(h_relu3, weights_out) + bias_out

换句话说,这很好:

# Assume that y_ is fed a batch of output labels for MNIST
y_ = tf.placeholder(tf.float32, [None, num_labels], name='y-input')
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, y_))
optimizer = tf.train.AdamOptimizer(1e-3).minimize(cost)

但不是这样:

y = tf.nn.softmax(logits)
cost = -tf.reduce_sum(y_ * tf.log(y))
optimizer = tf.train.AdamOptimizer(1e-3).minimize(cost)

后者运行良好,但在第一步之后精度仍然受到限制。前者使用softmax_cross_entropy_with_logits函数实际上确实学到了一些东西。我已经看到了后者的设置用于MNIST的深入示例,这就是为什么我想知道我的设置导致优化过程停滞的原因。

The latter runs fine, but the accuracy gets stuck after an initial step. The former using the softmax_cross_entropy_with_logits function actually does learn something. I've seen the latter's setup being used for the deep MNIST example, which is why I'm wondering what it is about my setup here that is causing the optimization procedure to stall.

推荐答案

更新:

最后,我可以通过实现 softmax_cross_entropy_with_logits()函数本身,您可以找到代码此处在我的GitHub上。对于正常和多标签问题,它有两个版本。

Finally, I can solve this problem with implementing the inside of softmax_cross_entropy_with_logits() function by myself, you can find the code here on my GitHub. It is in two version for normal and multi-label problems.

上一个答案:

最初来自tensorflow API:

Originally from tensorflow API:

(请注意,在源代码中,我们不使用此公式,

"(Note that in the source code, we don't use this formulation,

cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y),reduction_indices = [1]))

,因为它在数值上是不稳定的。相反,我们将tf.nn.softmax_cross_entropy_with_logits应用于未归一化的logit(例如,我们在tf.matmul上调用softmax_cross_entropy_with_logits (x,W)+ b),因为此数值上更稳定的函数会在内部计算softmax激活。在您的代码中,请考虑使用tf.nn。(sparse_)softmax_cross_entropy_with_logits)

because it is numerically unstable. Instead, we apply tf.nn.softmax_cross_entropy_with_logits on the unnormalized logits (e.g., we call softmax_cross_entropy_with_logits on tf.matmul(x, W) + b), because this more numerically stable function internally computes the softmax activation. In your code, consider using tf.nn.(sparse_)softmax_cross_entropy_with_logits instead)"

来源: https://www.tensorflow.org/版本/r0.11/tutorials/mnist/beginners/

这篇关于手动计算交叉熵与在Tensorflow中使用softmax_cross_entropy_with_logits的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆