tensorflow中compute_gradients返回什么 [英] What does compute_gradients return in tensorflow

查看:118
本文介绍了tensorflow中compute_gradients返回什么的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

mean_sqr = tf.reduce_mean(tf.pow(y_ - y, 2))
optimizer = tf.train.AdamOptimizer(LEARNING_RATE)
gradients, variables = zip(*optimizer.compute_gradients(mean_sqr))
opt = optimizer.apply_gradients(list(zip(gradients, variables)))

init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

for j in range(TRAINING_EPOCHS):
    sess.run(opt, feed_dict={x: batch_xs, y_: batch_xs})

我不清楚自己的compute_gradients返回什么?是否会为batch_xs分配的给定x值返回sum(dy / dx),并更新apply_gradients函数中的梯度,例如:

theta<-theta-LEARNING_RATE * 1 / m * gradients?

I don't clearly understand what compute_gradients returns? Does it return sum(dy/dx) for a given x values assigned by batch_xs, and update gradient in apply_gradients function such as :
theta <- theta - LEARNING_RATE*1/m*gradients?

还是已经返回给定批次中每个x值求和的梯度平均值,例如sum(dy / dx)* 1 / m,m定义为batch_size ?

Or does it already return average of gradients that is summed for each x values in a given batch such as sum(dy/dx)*1/m, m is defined as batch_size?

推荐答案

compute_gradients(a,b)返回d [sum a] / db。因此,在您的情况下,这将返回d mean_sq / d theta,其中theta是所有变量的集合。该方程式中没有 dx,您不是在计算梯度wrt。输入。那么批处理尺寸会怎样?您可以在mean_sq的定义中自己删除它:

compute_gradients(a,b) returns d[ sum a ]/db. So in your case this returns d mean_sq / d theta, where theta is set of all variables. There is no "dx" in this equation, you are not computing gradients wrt. inputs. So what happens with batch dimension? You remove it yourself in the definition of mean_sq:

mean_sqr = tf.reduce_mean(tf.pow(y_ - y, 2))

因此(为简单起见,我假设y为1D)

thus (I am assuming y is 1D for simplicity)

d[ mean_sqr ] / d theta = d[ 1/M SUM_i=1^M (pred(x_i), y_i)^2 ] / d theta
                        = 1/M SUM_i=1^M d[ (pred(x_i), y_i)^2 ] / d theta

因此,您可以控制它是对批处理求和,取均值还是进行其他操作,如果要定义mean_sqr以使用reduce_sum而不是reduce_mean,则渐变将是批处理的总和,依此类推。

so you are in control of whether it sums over batch, takes the mean or does something different, if you would define mean_sqr to use reduce_sum instead of a reduce_mean, gradients would be the sum over the batch and so on.

另一方面,apply_gradients只是应用渐变,确切的应用规则取决于优化器,对于GradientDescentOptimizer而言,应该是

On the other hand apply_gradients simply "applies the gradients", the exact rule for application is optimiser dependent, for GradientDescentOptimizer it would be

theta <- theta - learning_rate * gradients(theta)

对于亚当,您正在使用方程式当然更复杂

For Adam that you are using the equation is more complex of course.

注意,但是从数学意义上讲,tf.gradients更像是反向传播,而不是真正的渐变-这意味着它依赖于图依赖并且不能识别相反方向上的依赖。

Note however that tf.gradients is more like "backprop" than true gradient in mathematical sense - meaning that it depends on the graph dependencies and does not recognise dependences which are in "opposite" direction.

这篇关于tensorflow中compute_gradients返回什么的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆