TensorFlow XOR 代码适用于二维目标,但不是没有? [英] TensorFlow XOR code works fine with two dimensional target but not without?

查看:48
本文介绍了TensorFlow XOR 代码适用于二维目标,但不是没有?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尝试在 TensorFlow 中实现一个非常基本的 XOR FFNN.我可能只是误解了代码,但任何人都可以看到为什么这行不通的明显原因 - 爆炸到 NaN 并以 $0$ 的损失开始.如果你想弄乱它,切换正在工作/不起作用.谢谢!

Trying to implement a very basic XOR FFNN in TensorFlow. I may just be misunderstanding the code but can anyone see an obvious reason why this won't work-- blows up to NaNs and starts with loss of $0$. Toggles are on works/ doesn't work if you want to mess around with it. Thanks!

import math
import tensorflow as tf
import numpy as np

HIDDEN_NODES = 10

x = tf.placeholder(tf.float32, [None, 2])
W_hidden = tf.Variable(tf.truncated_normal([2, HIDDEN_NODES]))
b_hidden = tf.Variable(tf.zeros([HIDDEN_NODES]))
hidden = tf.nn.relu(tf.matmul(x, W_hidden) + b_hidden)
#-----------------
#DOESN"T WORK
W_logits = tf.Variable(tf.truncated_normal([HIDDEN_NODES, 1]))
b_logits = tf.Variable(tf.zeros([1]))
logits = tf.add(tf.matmul(hidden, W_logits),b_logits)
#WORKS
# W_logits = tf.Variable(tf.truncated_normal([HIDDEN_NODES, 2]))
# b_logits = tf.Variable(tf.zeros([2]))
# logits = tf.add(tf.matmul(hidden, W_logits),b_logits)
#-----------------

y = tf.nn.softmax(logits)

#-----------------
#DOESN"T WORK
y_input = tf.placeholder(tf.float32, [None, 1])

#WORKS
#y_input = tf.placeholder(tf.float32, [None, 2])
#-----------------

cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits, y_input)
loss = tf.reduce_mean(cross_entropy)
loss = cross_entropy
train_op = tf.train.GradientDescentOptimizer(0.1).minimize(loss)

init_op = tf.initialize_all_variables()

sess = tf.Session()
sess.run(init_op)

xTrain = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])

#-----------------
#DOESN"T WORK
yTrain = np.array([[0], [1], [1], [0]])
# WORKS
#yTrain = np.array([[1, 0], [0, 1], [0, 1], [1, 0]])
#-----------------

for i in xrange(500):
  _, loss_val,logitsval = sess.run([train_op, loss,logits], feed_dict={x: xTrain, y_input: yTrain})

  if i % 10 == 0:
    print "Step:", i, "Current loss:", loss_val,"logits",logitsval

print sess.run(y,feed_dict={x: xTrain})

推荐答案

TL;DR: 要使其正常工作,您应该使用

TL;DR: For this to work, you should use

loss = tf.nn.l2_loss(logits - y_input)

...而不是 tf.nn.softmax_cross_entropy_with_logits.

tf.nn.softmax_cross_entropy_with_logits 运算符期望 logits 和标签输入为 batch_sizenum_classes 大小的矩阵.每行 logits 是跨类的未缩放概率分布;每行标签都是批次中每个示例的真实类的单热编码.如果输入与这些假设不匹配,则训练过程可能会出现分歧.

The tf.nn.softmax_cross_entropy_with_logits operator expects the logits and labels inputs to be a matrix of size batch_size by num_classes. Each row of logits is an unscaled probability distribution across the classes; and each row of labels is a one-hot encoding of the true class for each example in the batch. If the inputs do not match these assumptions, the training process may diverge.

在这段代码中,logits batch_size 乘以 1,这意味着只有一个类,softmax 输出所有示例的 0 类预测;标签不是一劳永逸的.如果你看一下运算符的实现a>,tf.nn.softmax_cross_entropy_with_logits 的反向传播值为:

In this code, the logits are batch_size by 1, which means that there is only a single class, and the softmax outputs a prediction of class 0 for all of the examples; the labels are not one-hot. If you look at the implementation of the operator, the backprop value for tf.nn.softmax_cross_entropy_with_logits is:

// backprop: prob - labels, where
//   prob = exp(logits - max_logits) / sum(exp(logits - max_logits))

这将是 [[1], [1], [1], [1]] - [[0], [1], [1], [0]] 在每个step,显然不收敛.

This will be [[1], [1], [1], [1]] - [[0], [1], [1], [0]] in every step, which clearly does not converge.

这篇关于TensorFlow XOR 代码适用于二维目标,但不是没有?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆