急切执行:梯度计算 [英] Eager execution: gradient computation

查看:23
本文介绍了急切执行:梯度计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道为什么这个非常简单的梯度计算不能正常工作.它实际上是生成一个 [None, None] 向量.显然,这不是想要的输出.

I m wondering why is this very simple gradient computation not working correctly. It is actually generating a [None, None] vector. Obviously, this is not the desired output.

import tensorflow as tf
tf.enable_eager_execution()

a = tf.constant(0.)
with tf.GradientTape() as tape:
    b = 2 * a
da, db = tape.gradient(a + b, [a, b])
print(da)
print(db)

推荐答案

您发布的代码片段有两个小问题:

There are two minor issues with the code snippet you posted:

  1. a + b 计算发生在磁带上下文之外,因此没有被记录.请注意,GradientTape 只能区分记录的计算.在磁带上下文中计算 a + b 将解决这个问题.

  1. The a + b computation is happening outside the context of the tape, so it is not being recorded. Note that GradientTape can only differentiate computation that is recorded. Computing a + b inside the tape context will fix that.

需要观察"源张量.有两种方法可以向磁带发出信号表明应该监视张量:(a) 显式调用 tape.watch,或 (b) 使用 tf.Variable(监视所有变量),请参阅 文档

Source tensors need to be "watched". There are two ways to signal to the tape that a tensor should be watched: (a) explicitly invoking tape.watch, or (b) Using a tf.Variable (all variables are watched), see documentation

长话短说,对您的代码段进行两个微不足道的修改即可:

Long story short, two trivial modifications to your snippet do the trick:

import tensorflow as tf
tf.enable_eager_execution()

a = tf.constant(0.)
with tf.GradientTape() as tape:
    tape.watch(a)
    b = 2 * a
    c = a + b
da, db = tape.gradient(c, [a, b])
print(da)
print(db)

希望有所帮助.

这篇关于急切执行:梯度计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆