如何确保您的计算图是可微的 [英] How to make sure your computation graph is differentiable
问题描述
一些Tensorflow运算(例如tf.argmax
)是不可区分的(即没有计算出任何梯度并将其用于反向传播).
Some of the Tensorflow operations (e.g. tf.argmax
) are not differentiable (i.e. no gradients are calculated and used in back-propagation).
Tensorflow的答案建议不要在Tensorflow代码中搜索RegisterGradient
.我还注意到Tensorflow有一个tf.NotDifferentiable
API调用,用于声明操作是不可微分的.
An answer to Tensorflow what operations are differentiable and what are not? suggests searching for RegisterGradient
in the Tensorflow code. I also noticed Tensorflow has a tf.NotDifferentiable
API call for declaring an operation to be non-differentiable.
如果我使用不可微分函数,是否会发出警告? 有没有一种编程的方法来确保我的整个计算图是可微的?
Is there a warning issued if I use non-differentiable functions? Is there a programmatic way to ensure that my entire computation graph is differentiable?
推荐答案
大多数浮点运算都将具有渐变,因此第一个通过的答案就是检查图中是否没有int32/int64 dtype张量.这很容易做到,但可能没有用(即,任何非平凡的模型都将执行不可微分的索引操作).
Most floating point operations will have gradients, so a first pass answer would just be to check that there are no int32/int64 dtype Tensors in the graph. This is easy to do, but probably not useful (i.e. any non-trivial model will be doing non-differentiable indexing operations).
您可以进行某种类型的自省,遍历GraphDef中的操作并检查是否已注册渐变.我认为这也不是非常有用.如果我们不信任渐变是首先注册的,为什么还要信任渐变是否正确?
You could do some type of introspection, looping over the operations in the GraphDef and checking that they have gradients registered. I would argue that this is not terribly useful either; if we don't trust that gradients are registered in the first place, why trust that they're correct if registered?
相反,我将在几个点上为您的模型进行数值梯度检查.例如,假设我们注册了没有渐变的PyFunc:
Instead, I'd go with numerical gradient checking at a few points for your model. For example, let's say we register a PyFunc without a gradient:
import tensorflow as tf
import numpy
def my_func(x):
return numpy.sinh(x)
with tf.Graph().as_default():
inp = tf.placeholder(tf.float32)
y = tf.py_func(my_func, [inp], tf.float32) + inp
grad, = tf.gradients(y, inp)
with tf.Session() as session:
print(session.run([y, grad], feed_dict={inp: 3}))
print("Gradient error:", tf.test.compute_gradient_error(inp, [], y, []))
这使我得到如下输出:
[13.017875, 1.0]
Gradient error: 1.10916996002
数字梯度可能有点棘手,但是通常任何比机器epsilon(float32约为1e-7)大几个数量级的梯度误差都会为我带来平稳的功能.
Numerical gradients can be a bit tricky, but generally any gradient error which is more than a few orders of magnitude more than the machine epsilon (~1e-7 for float32) would raise red flags for me for a supposedly smooth function.
这篇关于如何确保您的计算图是可微的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!