为什么训练误差会随着每个时期的增加而增加,使用张量流实现线性回归? [英] Why is the training error increasing with each epoch, linear regression implemented using tensorflow?
问题描述
我是Tensorflow的新手,并且已经实现了线性回归模型.我正在使用的数据集可在 https://archive.ics.uci.edu/中找到ml/datasets/Housing .在每个时期,损失都在增加. 这是我的代码-
I am new to tensorflow and have implemented a linear regression model. The dataset I am using is available in https://archive.ics.uci.edu/ml/datasets/Housing. At each epoch the loss is increasing. Here is my code -
import tensorflow as tf
import numpy as np
A = np.loadtxt("housing.data",dtype=np.float32)
s = A.shape
B = A[:,:s[1]-1]
C = A[:,-1]
C = C.reshape(s[0],1)
W = tf.Variable(tf.ones([s[1]-1,1]))
b = tf.Variable([.3],tf.float32)
x = tf.placeholder(tf.float32,shape = (None,s[1]-1))
y = tf.placeholder(tf.float32,shape = (None,1))
linear_model = tf.matmul(x,W) + b
loss = tf.reduce_mean(tf.square(linear_model - y)) # sum of the squares
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
for i in xrange(1000):
sess.run(train,feed_dict={x:B,y:C})
curr_W, curr_b, curr_loss = sess.run([W, b, loss], feed_dict={x:B, y:C})
print("W: %s b: %s loss: %s"%(curr_W, curr_b, curr_loss))
推荐答案
在达到NaN之前,您的损失仅增加了几次迭代.问题似乎在于初始损失非常大(10^13
),因此也可能是它的梯度,这会导致更新太大,使您的参数处在更差的位置,并最终产生0
并除以它).
Your loss increases for just a few iterations before reaching NaN. The problem seems to be that the initial loss is very big (10^13
), hence probably its gradient too, which creates an update that is way too big, puts you in an even worse spot for your parameters, and eventually produces NaN
s gradient backpropagation (probably through an overflow somehow, or because it produces a value 0
somewhere and divides by it).
您可以通过降低学习率来解决此问题,这将从一开始就补偿这种巨大的梯度值. 0.000001
为我解决了这个问题(尽管与通常的学习率相比,这是一个非常低的值,所以仅在第一步中使用它就有点问题了).但是,由于这只是开始,因此您可能希望在以后的培训中获得更高的学习率.您可以在几个步骤后进行更改,或者更强大的功能是剪切渐变.
You can fix this by lowering your learning rate, which will compensate for this huge gradient values from start. 0.000001
fixes the problem for me (this is a very low value though compared to usual learning rates, so it is a bit of a problem to have to use that only for the first steps). However, since it's just for the start, you might want a higher learning rate for the rest of your training. You can change it after a few steps, or something more robust would be to clip your gradient.
编辑
此外,您应该随机初始化体重:
Also, you should initilaize your weights randomly:
W = tf.Variable(tf.truncated_normal([s[1]-1,1], stddev=0.1))
它对我来说非常好,它具有随机优化,梯度剪切的功能:
It learns quite well for me with random optimization, gradient clipping:
optimizer = tf.train.GradientDescentOptimizer(0.0005)
gvs = optimizer.compute_gradients(loss)
capped_gvs = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gvs]
train = optimizer.apply_gradients(capped_gvs)
10000
迭代和learning rate = 0.0005
,但是您可能应该使用递减的学习速率,从那里开始逐渐降低,一段时间后减小.
10000
iterations, and learning rate = 0.0005
, but you should probably use a decaying learning rate, starting there and smaller after a while.
这篇关于为什么训练误差会随着每个时期的增加而增加,使用张量流实现线性回归?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!