为什么我的GradientDescentOptimizer产生NaN? [英] Why does my GradientDescentOptimizer produce NaN?

查看:130
本文介绍了为什么我的GradientDescentOptimizer产生NaN?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在重做Coursera的Andrew Ng教授的机器学习"课程作业,而我陷入了Logistic回归部分.

I'm currently working on reworking Professor Andrew Ng's "Machine Learning" course assignments from Coursera, and I got stuck in the Logistic Regression portion.

filename = 'data/ex2data1.txt'
data = np.loadtxt(filename, delimiter = ",", unpack = True)

# Data matrices
xtr = np.transpose(np.array(data[:-1]))
ytr = np.transpose(np.array(data[-1:]))

# Initial weights
W = tf.Variable(tf.zeros([2,1], dtype = tf.float64))

# Bias
b = tf.Variable(tf.zeros([1], dtype = tf.float64))

# Cost function
y_ = tf.nn.sigmoid(tf.matmul(xtr,W) + b)

cost = -tf.reduce_mean(ytr*tf.log(y_) + (1-ytr)*tf.log(1-y_))
optimize = tf.train.GradientDescentOptimizer(0.01).minimize(cost)

corr = tf.equal(tf.argmax(ytr,1), tf.argmax(y_,1))
acc = tf.reduce_mean(tf.cast(corr, tf.float64))

init = tf.initialize_all_variables()

with tf.Session() as sess:
    sess.run(init)
    print(sess.run(cost))
    for _ in range(3):
        sess.run(optimize)
        print(sess.run(cost))

这会产生答案:

0.69314718056
nan
nan
nan

cost函数的第一个结果是正确的,但接下来的结果应该是:

The first result to the cost function is correct, but the next ones are supposed to be:

3.0133
1.5207
0.7336

相反,我得到了一堆NaN.我曾尝试过降低学习率,但无济于事.我究竟做错了什么?是否可以在TensorFlow中重现此分配?

and instead I get a bunch of NaN's. I've tried lower learning rates, all to no avail. What am I doing wrong? And is it possible to reproduce this assignment in TensorFlow?

PS:其他python解决方案似乎正在使用scipy.optimize,但我不知道如何将其与TensorFlow值一起使用,并且如果可能的话,我只想使用TensorFlow.

PS: Other python solutions seem to be using scipy.optimize but I have no idea how I would use that with TensorFlow values, and I would like to use only TensorFlow if at all possible.

我还尝试过将偏差设置为tf.ones而不是tf.zeros,但这也没有用.

I've also tried putting bias as tf.ones instead of tf.zeros, but it also didn't work.

推荐答案

您的对数没有清除输入.负输入值很可能会迅速使任何浮点数运算不正确.

Your logarithm isn't sanitizing the input. It might very well happen that you have negative input values that quickly NaN any floating number arithmetic.

我做了什么Java代码中大量使用相似域中的日志:

  • 检查NaN或Infinity并假定输出为零
  • 如果输入为负,则将输出裁剪为某个静态数字,例如. log(1e-5)〜= -11.51
  • 否则,只需记录日志

在Java中,代码看起来像这样,不难翻译为tf:

In Java that code looks like this, shouldn't be difficult to translate to tf:

public static double guardedLogarithm(double input) {
    if (Double.isNaN(input) || Double.isInfinite(input)) {
      return 0d;
    } else if (input <= 0d || input <= -0d) {
      // assume a quite low value of log(1e-5) ~= -11.51
      return -10d;
    } else {
      return FastMath.log(input);
    }
  }

这篇关于为什么我的GradientDescentOptimizer产生NaN?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆