为什么我的线性回归得到 nan 值而不是学习? [英] Why does my linear regression get nan values instead of learning?

查看:42
本文介绍了为什么我的线性回归得到 nan 值而不是学习?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行以下代码:

import tensorflow as tf

# data set
x_data = [10., 20., 30., 40.]
y_data = [20., 40., 60., 80.]

# try to find values for w and b that compute y_data = W * x_data + b
# range is -100 ~ 100
W = tf.Variable(tf.random_uniform([1], -1000., 1000.))
b = tf.Variable(tf.random_uniform([1], -1000., 1000.))

X = tf.placeholder(tf.float32)
Y = tf.placeholder(tf.float32)

# my hypothesis
hypothesis = W * X + b

# Simplified cost function
cost = tf.reduce_mean(tf.square(hypothesis - Y))

# minimize
a = tf.Variable(0.1)  # learning rate, alpha
optimizer = tf.train.GradientDescentOptimizer(a)
train = optimizer.minimize(cost)  # goal is minimize cost

# before starting, initialize the variables
init = tf.initialize_all_variables()

# launch
sess = tf.Session()
sess.run(init)

# fit the line
for step in xrange(2001):
    sess.run(train, feed_dict={X: x_data, Y: y_data})
    if step % 100 == 0:
        print step, sess.run(cost, feed_dict={X: x_data, Y: y_data}), sess.run(W), sess.run(b)

print sess.run(hypothesis, feed_dict={X: 5})
print sess.run(hypothesis, feed_dict={X: 2.5})

然后就是结果

0 1.60368e+10 [ 4612.54003906] [ 406.81304932]
100 nan [ nan] [ nan]
200 nan [ nan] [ nan]
300 nan [ nan] [ nan]
400 nan [ nan] [ nan]
500 nan [ nan] [ nan]
600 nan [ nan] [ nan]
700 nan [ nan] [ nan]
800 nan [ nan] [ nan]
900 nan [ nan] [ nan]
1000 nan [ nan] [ nan]
1100 nan [ nan] [ nan]
1200 nan [ nan] [ nan]
1300 nan [ nan] [ nan]
1400 nan [ nan] [ nan]
1500 nan [ nan] [ nan]
1600 nan [ nan] [ nan]
1700 nan [ nan] [ nan]
1800 nan [ nan] [ nan]
1900 nan [ nan] [ nan]
2000 nan [ nan] [ nan]
[ nan]
[ nan]

我不明白为什么这个结果是nan?

I don't understand why this result is nan?

如果我把初始数据改成这个

If i change the initial data to this

x_data = [1., 2., 3., 4.]
y_data = [2., 4., 6., 8.]

然后就没有问题了.这是为什么?

Then it was working no problem. Why is that?

推荐答案

你正在溢出 float32 因为学习率对于你的问题来说太高了,而不是收敛权重变量 (W) 是在梯度下降的每一步都朝着越来越大的幅度振荡.

You are overflowing float32 because the learning rate is too high for your problem, and instead of converging the weight variable (W) is oscillating towards larger and larger magnitudes on each step of gradient descent.

如果你改变

a = tf.Variable(0.1)

a = tf.Variable(0.001)

权重应该收敛得更好.您可能还想增加迭代次数(至 ~ 50000).

the weights should converge better. You will probably want to increase number of iterations (to ~ 50000) too.

选择一个好的学习率通常是实现或使用机器学习算法时的第一个挑战.得到增加的损失值而不是收敛到最小值通常是学习率过高的迹象.

Picking a good learning rate is often the first challenge when implementing or using a machine learning algorithm. Getting increased loss values instead of converging to a minimum is usually a sign that learning rate is too high.

在您的情况下,当您在训练数据中使用更大的量级时,拟合线的具体问题更容易受到发散权重的影响.这就是为什么在训练之前对数据进行归一化很常见的一个原因,例如神经网络.

In your case the specific problem of fitting to the line is made more vulnerable to diverging weights when you use larger magnitudes in the training data. This is one reason why it is common to normalise data prior to training in e.g. neural networks.

此外,您的起始权重和偏差范围非常大,这意味着它们可能与理想值相差很远,并且在开始时具有非常大的损失值和梯度.当您查看更高级的学习算法时,为初始值选择一个合适的范围是另一件重要的事情.

In addition your starting weight and bias are given a very large range, which means they can be very far from ideal values and have very large loss values and gradients at the start. Picking a good range for initial values is another critical thing to get right as you look at more advanced learning algorithms.

这篇关于为什么我的线性回归得到 nan 值而不是学习?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆