变量切片返回梯度无 [英] Slice of a variable returns gradient None

查看:82
本文介绍了变量切片返回梯度无的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在玩 tf.gradients ()函数,并且遇到了我没有想到的行为。也就是说,它似乎无法计算切片变量的梯度。我整理了一个例子,希望能说明我的意思:

I've been playing around with the tf.gradients() function and came across a behavior I didn't expect. Namely it seems to be unable to calculate the gradient of a sliced Variable. I put together a example, that hopefully shows what I mean:

import tensorflow as tf

a = tf.Variable([1.0])
b = tf.Variable([1.0])
c = tf.concat(0, [a, b])
print(c)  # >Tensor("concat:0", shape=(2,), dtype=float32)

grad_full = tf.gradients(c,  c)
grad_slice1 = tf.gradients(c,  a)
grad_slice2 = tf.gradients(c,  c[:, ])  # --> Here the gradient is None
grad_slice3 = tf.gradients(c,  c[0, ])  # --> Here the gradient is None

print(grad_full)  # >[<tf.Tensor 'gradients/Fill:0' shape=(2,) dtype=float32>]
print(grad_slice1)  # >[<tf.Tensor 'gradients_1/concat_grad/Slice:0' shape=(1,) dtype=float32>]
print(grad_slice2)  # >[None]
print(grad_slice3)  # >[None]

sess = tf.Session()
sess.run(tf.initialize_all_variables())

grad_full_v, grad_slice_v = sess.run([grad_full[0], grad_slice1[0]])
print(grad_full_v)  # >[ 1.  1.]
print(grad_slice_v)  # >[ 1.]

我的问题是:

1)我是否

2)如果是,则使用tf.gradients()函数来实现此目的吗?根据我的理解,切片不一定会破坏反向传播。

2) If so, is there a reason for this behavior? In my understanding slicing should not necessarily break the backpropagation.

3)这是否意味着我需要避免在整个网络中切片(或至少对变量的每条路径进行切片)损失)?例如,这意味着,我不能将一个完全连接的层的结果切成很多有意义的部分(就像用一个fc层估计多个标量,然后将联合估计切成我要使用的部分一样。)

3) Does that mean I need to avoid slicing within my entire network (or at least for every path from a Variable to the loss)? For example this would mean, that I must not slice the results of a fully connected layer into numerous meaningful parts (Like estimating multiple scalars with one fc layer and then slicing the joint estimation into parts I want to use).

我正在使用Tensorflow 0.11 RC0在python 3.5的Ubuntu 16上从源代码构建。

I'm working with Tensorflow 0.11 RC0 build from source on Ubuntu 16 with python 3.5.

推荐答案

d = c [:,] 创建一个不同于 a,b,c 的张量。如果考虑依赖图,则d取决于c。则渐变在这种情况下不起作用。如果x取决于y,则 grad(y,x)起作用,而不是相反。

d = c[:, ] creates a different tensor then a, b, c. If you consider dependency graph, d depends on c. then gradients doesn't work in this case. grad(y, x) works if x depends on y, not the other way around.

这篇关于变量切片返回梯度无的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆