急切执行:梯度计算 [英] Eager execution: gradient computation
问题描述
我想知道为什么这个非常简单的梯度计算不能正常工作.它实际上是生成一个 [None, None] 向量.显然,这不是想要的输出.
I m wondering why is this very simple gradient computation not working correctly. It is actually generating a [None, None] vector. Obviously, this is not the desired output.
import tensorflow as tf
tf.enable_eager_execution()
a = tf.constant(0.)
with tf.GradientTape() as tape:
b = 2 * a
da, db = tape.gradient(a + b, [a, b])
print(da)
print(db)
推荐答案
您发布的代码片段有两个小问题:
There are two minor issues with the code snippet you posted:
a + b
计算发生在磁带上下文之外,因此没有被记录.请注意,GradientTape
只能区分记录的计算.在磁带上下文中计算a + b
将解决这个问题.
The
a + b
computation is happening outside the context of the tape, so it is not being recorded. Note thatGradientTape
can only differentiate computation that is recorded. Computinga + b
inside the tape context will fix that.
需要观察"源张量.有两种方法可以向磁带发出信号表明应该监视张量:(a) 显式调用 tape.watch
,或 (b) 使用 tf.Variable
(监视所有变量),请参阅 文档
Source tensors need to be "watched". There are two ways to signal to the tape that a tensor should be watched: (a) explicitly invoking tape.watch
, or (b) Using a tf.Variable
(all variables are watched), see documentation
长话短说,对您的代码段进行两个微不足道的修改即可:
Long story short, two trivial modifications to your snippet do the trick:
import tensorflow as tf
tf.enable_eager_execution()
a = tf.constant(0.)
with tf.GradientTape() as tape:
tape.watch(a)
b = 2 * a
c = a + b
da, db = tape.gradient(c, [a, b])
print(da)
print(db)
希望有所帮助.
这篇关于急切执行:梯度计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!