Tensorflow:gradient_override_map 不能覆盖 op tf.stack 的后向梯度 [英] Tensorflow: gradient_override_map cannot override op tf.stack 's backward gradient

查看:15
本文介绍了Tensorflow:gradient_override_map 不能覆盖 op tf.stack 的后向梯度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图用tf.RegisterGradienttf.gradient_override_map编辑tf.stack op的后向梯度计算机制,这是我的代码:

I was trying to edit tf.stack op's backward gradient calculation mechanism with tf.RegisterGradientandtf.gradient_override_map, here are my codes:

import tensorflow as tf

class SynthGradBuilder(object):
    def __init__(self):
        self.num_calls = 0

    def __call__(self, x, l=1.0):
        op_name = "SynthGrad%d" % self.num_calls
        @tf.RegisterGradient(op_name)
        def _grad_synth(op, grad):
            return grad[0]

        g = tf.get_default_graph()
        with g.gradient_override_map({"stack": op_name}):
            y = tf.stack([x,x])

        self.num_calls += 1
        return y

GradSys = SynthGradBuilder()

在另一个脚本中,我写了

in another script, I wrote

import tensorflow as tf
from gradient_synthesizer import GradSys

x = tf.Variable([1,2])
y = GradSys(x, l=1)
z = tf.stack([x,x])


grad = tf.gradients(y, x, grad_ys=[[tf.convert_to_tensor([3, 4]), 
                              tf.convert_to_tensor([6, 8])]])
grad_stack = tf.gradients(z, x, grad_ys=[[tf.convert_to_tensor([3, 4]), 
                              tf.convert_to_tensor([6, 8])]])

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    print "grad bp: ", sess.run(grad)
    print "grad_stack: ", sess.run(grad_stack)
    print "y: ", sess.run(y)

预期的输出应该是:

grad bp: [3,4];
grad_stack: [3+6, 4+8] = [9, 12];
y: [[1,2], [1,2]];

我实际上从代码中得到的是:

What I actually got from the code was:

表明tf.stack的后向梯度根本没有被替换,这与我的预期相反.

indicating that tf.stack's backward gradients were not replaced at all, which was opposite to my expectation.

不知道是不是错误使用stack"作为操作tf.stack的类型字符串导致的,我做了如下实验:

I'm not sure if such discrepancy was brought by falsely using "stack" as the type string of operation tf.stack, I carried out an experiment in the following way:

描述张量 y 的第一项,stack:0"建议 op tf.stack 的注册名称是stack",这也是它的类型字符串.所以看起来这不是堆栈"的错.

The first item describing tensor y, the "stack:0" suggests op tf.stack 's registered name is "stack", which is also its type string. So it seems it is not "stack"'s fault.

我无法找出代码问题的原因.我想知道是否有人可以帮助我.

I am at a loss to figure out the causes of my codes' problem. I wonder if anyone can help me with that.

推荐答案

Tl;dr: 正确的代码应该是:

Tl;dr: The correct code should be:

@tf.RegisterGradient(op_name)
def _grad_synth(op, grad):
  x, y = tf.unstack(grad)
  return [x, tf.zeros_like(y)]

g = tf.get_default_graph()
with g.gradient_override_map({"Pack": op_name}):
  y = tf.stack([x, x])

<小时>

因为这是一个很常见的问题,所以我想解释一下更多细节:


Because this is a quite common question, I want to explain a little bit more details:

您的原始代码中有两个主要问题:

There are two main issues in your original code:

  1. gradient_override_map 的错误用法:
  1. Wrong usage of gradient_override_map:

tf.stack 的实际 OP 名称是 Pack(不是 Stack),因此您需要覆盖 Pack 而不是 Stack:

The actual OP name for tf.stack is Pack (not Stack), so you need to ovrride Pack instead of Stack:

`g.gradient_override_map({"Pack": op_name})`.

您可能想知道我怎么知道实际的 OP 名称?好吧,一个简单的方法是通过运行以下代码来探测 GraphDef:

You may wonder how do I know the actual OP name? Well, a simple way is to prober the GraphDef by running the following code:

with tf.Graph().as_default():
  x = tf.constant(0)
  y = tf.stack([x, x])
  print(tf.get_default_graph().as_graph_def())

  1. 错误的梯度函数:

Pack 的原始梯度是一个简单的 Unpack (官方代码).在你的情况下,你仍然需要先解包梯度,但只传播第一部分:

The original gradients for Pack is a simple Unpack (official code). In your case, you still need to first unpack the gradients, but only propogate the FIRST part:

@tf.RegisterGradient(op_name)
def _grad_synth(op, grad):
  x, y = tf.unstack(grad)
  return [x, tf.zeros_like(y)]

请注意,此代码非常适合您的情况.但是,如果您想支持任意长度的堆栈,则可以使用稍微复杂一点的版本:

Note, this code works perfectly for your case. However, if you want to support any length of stack, you can use a slightly more complicated version:

@tf.RegisterGradient(op_name)
def _grad_synth(op, grad):
  x_list = tf.unstack(grad)
  for i in range(1, len(x_list)):
    x_list[i] = tf.zeros_like(x_list[i])
  return x_list

这篇关于Tensorflow:gradient_override_map 不能覆盖 op tf.stack 的后向梯度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆