为什么训练误差会随着每个时期的增加而增加,使用张量流实现线性回归? [英] Why is the training error increasing with each epoch, linear regression implemented using tensorflow?

查看:116
本文介绍了为什么训练误差会随着每个时期的增加而增加,使用张量流实现线性回归?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Tensorflow的新手,并且已经实现了线性回归模型.我正在使用的数据集可在 https://archive.ics.uci.edu/中找到ml/datasets/Housing .在每个时期,损失都在增加. 这是我的代码-

I am new to tensorflow and have implemented a linear regression model. The dataset I am using is available in https://archive.ics.uci.edu/ml/datasets/Housing. At each epoch the loss is increasing. Here is my code -

import tensorflow as tf
import numpy as np

A = np.loadtxt("housing.data",dtype=np.float32)
s = A.shape
B = A[:,:s[1]-1]
C = A[:,-1]
C = C.reshape(s[0],1)

W = tf.Variable(tf.ones([s[1]-1,1]))
b = tf.Variable([.3],tf.float32)

x = tf.placeholder(tf.float32,shape = (None,s[1]-1))
y = tf.placeholder(tf.float32,shape = (None,1))

linear_model = tf.matmul(x,W) + b
loss = tf.reduce_mean(tf.square(linear_model - y)) # sum of the squares
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)

init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

for i in xrange(1000):
    sess.run(train,feed_dict={x:B,y:C})


curr_W, curr_b, curr_loss  = sess.run([W, b, loss], feed_dict={x:B, y:C})
print("W: %s b: %s loss: %s"%(curr_W, curr_b, curr_loss))

推荐答案

在达到NaN之前,您的损失仅增加了几次迭代.问题似乎在于初始损失非常大(10^13),因此也可能是它的梯度,这会导致更新太大,使您的参数处在更差的位置,并最终产生 s梯度反向传播(可能是通过某种方式的溢出,或者是因为它在某处产生值0并除以它).

Your loss increases for just a few iterations before reaching NaN. The problem seems to be that the initial loss is very big (10^13), hence probably its gradient too, which creates an update that is way too big, puts you in an even worse spot for your parameters, and eventually produces NaNs gradient backpropagation (probably through an overflow somehow, or because it produces a value 0 somewhere and divides by it).

您可以通过降低学习率来解决此问题,这将从一开始就补偿这种巨大的梯度值. 0.000001为我解决了这个问题(尽管与通常的学习率相比,这是一个非常低的值,所以仅在第一步中使用它就有点问题了).但是,由于这只是开始,因此您可能希望在以后的培训中获得更高的学习率.您可以在几个步骤后进行更改,或者更强大的功能是剪切渐变.

You can fix this by lowering your learning rate, which will compensate for this huge gradient values from start. 0.000001 fixes the problem for me (this is a very low value though compared to usual learning rates, so it is a bit of a problem to have to use that only for the first steps). However, since it's just for the start, you might want a higher learning rate for the rest of your training. You can change it after a few steps, or something more robust would be to clip your gradient.

编辑

此外,您应该随机初始化体重:

Also, you should initilaize your weights randomly:

W = tf.Variable(tf.truncated_normal([s[1]-1,1], stddev=0.1))

它对我来说非常好,它具有随机优化,梯度剪切的功能:

It learns quite well for me with random optimization, gradient clipping:

optimizer = tf.train.GradientDescentOptimizer(0.0005)

gvs = optimizer.compute_gradients(loss)
capped_gvs = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gvs]
train = optimizer.apply_gradients(capped_gvs)

10000迭代和learning rate = 0.0005,但是您可能应该使用递减的学习速率,从那里开始逐渐降低,一段时间后减小.

10000 iterations, and learning rate = 0.0005, but you should probably use a decaying learning rate, starting there and smaller after a while.

这篇关于为什么训练误差会随着每个时期的增加而增加,使用张量流实现线性回归?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆