Tensorflow:如何在 python 中用梯度编写操作? [英] Tensorflow: How to write op with gradient in python?

查看:32
本文介绍了Tensorflow:如何在 python 中用梯度编写操作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想用 python 编写一个 TensorFlow op,但我希望它是可微的(以便能够计算梯度).

这个问题询问如何在python中编写一个op,答案建议使用py_func(没有梯度): 关于第二个(几乎无处不在,并且在有限数量的点上是无限的,但让我们忽略这一点,请参阅 https://math.stackexchange.com/questions/1849280/derivative-of-remainder-function-wrt-denominator 详情).所以我们有

def modgrad(op, grad):x = op.inputs[0] # 第一个参数(通常你需要那些来计算梯度,比如 x^2 的梯度是 2x.)y = op.inputs[1] # 第二个参数return grad * 1, grad * tf.neg(tf.floordiv(x, y)) #分别相对于第一个和第二个参数的传播梯度

grad 函数需要返回一个 n 元组,其中 n 是操作的参数数量.请注意,我们需要返回输入的 tensorflow 函数.

使用梯度制作 TF 函数: 正如上面提到的来源中所解释的,有一个技巧可以使用 tf.RegisterGradient [doc]tf.Graph.gradient_override_map [doc].

harpone 复制代码,我们可以修改 tf.py_funccode>函数使其同时定义渐变:

 将 tensorflow 导入为 tfdef py_func(func, inp, Tout, stateful=True, name=None, grad=None):# 需要生成唯一的名称以避免重复:rnd_name = 'PyFuncGrad' + str(np.random.randint(0, 1E+8))tf.RegisterGradient(rnd_name)(grad) # 参见 _MySquareGrad 以获得 grad 示例g = tf.get_default_graph()使用 g.gradient_override_map({"PyFunc": rnd_name}):return tf.py_func(func, inp, Tout, stateful=stateful, name=name)

stateful 选项是告诉 tensorflow 函数是否总是为相同的输入提供相同的输出(stateful = False),在这种情况下,tensorflow 可以简单地绘制 tensorflow 图,这是我们的情况,并且将大多数情况下可能都是这种情况.

将它们组合在一起:现在我们有了所有的部分,我们可以将它们全部组合在一起:

from tensorflow.python.framework 导入操作def tf_mod(x,y, name=None):使用 ops.op_scope([x,y], name, "mod") 作为名称:z = py_func(np_mod,[x,y],[tf.float32],姓名=姓名,grad=modgrad) # <-- 这是对渐变的调用返回 z[0]

tf.py_func 作用于张量列表(并返回张量列表),这就是为什么我们有 [x,y](并返回 z[0]).现在我们完成了.我们可以测试它.

测试:

 使用 tf.Session() 作为 sess:x = tf.constant([0.3,0.7,1.2,1.7])y = tf.constant([0.2,0.5,1.0,2.9])z = tf_mod(x,y)gr = tf.gradients(z, [x,y])tf.initialize_all_variables().run()打印(x.eval(), y.eval(),z.eval(), gr[0].eval(), gr[1].eval())

<块引用>

[ 0.30000001 0.69999999 1.20000005 1.70000005] [ 0.2 0.5 1. 2.9000001] [ 0.10000001 0.1999990001 0.1999990001] .0.10.10.10.10.10.10.05-1.-1.0.]

成功!

I would like to write a TensorFlow op in python, but I would like it to be differentiable (to be able to compute a gradient).

This question asks how to write an op in python, and the answer suggests using py_func (which has no gradient): Tensorflow: Writing an Op in Python

The TF documentation describes how to add an op starting from C++ code only: https://www.tensorflow.org/versions/r0.10/how_tos/adding_an_op/index.html

In my case, I am prototyping so I don't care about whether it runs on GPU, and I don't care about it being usable from anything other than the TF python API.

解决方案

Yes, as mentionned in @Yaroslav's answer, it is possible and the key is the links he references: here and here. I want to elaborate on this answer by giving a concret example.

Modulo opperation: Let's implement the element-wise modulo operation in tensorflow (it already exists but its gradient is not defined, but for the example we will implement it from scratch).

Numpy function: The first step is to define the opperation we want for numpy arrays. The element-wise modulo opperation is already implemented in numpy so it is easy:

import numpy as np
def np_mod(x,y):
    return (x % y).astype(np.float32)

The reason for the .astype(np.float32) is because by default tensorflow takes float32 types and if you give it float64 (the numpy default) it will complain.

Gradient Function: Next we need to define the gradient function for our opperation for each input of the opperation as tensorflow function. The function needs to take a very specific form. It need to take the tensorflow representation of the opperation op and the gradient of the output grad and say how to propagate the gradients. In our case, the gradients of the mod opperation are easy, the derivative is 1 with respect to the first argument and with respect to the second (almost everywhere, and infinite at a finite number of spots, but let's ignore that, see https://math.stackexchange.com/questions/1849280/derivative-of-remainder-function-wrt-denominator for details). So we have

def modgrad(op, grad):
    x = op.inputs[0] # the first argument (normally you need those to calculate the gradient, like the gradient of x^2 is 2x. )
    y = op.inputs[1] # the second argument

    return grad * 1, grad * tf.neg(tf.floordiv(x, y)) #the propagated gradient with respect to the first and second argument respectively

The grad function needs to return an n-tuple where n is the number of arguments of the operation. Notice that we need to return tensorflow functions of the input.

Making a TF function with gradients: As explained in the sources mentioned above, there is a hack to define gradients of a function using tf.RegisterGradient [doc] and tf.Graph.gradient_override_map [doc].

Copying the code from harpone we can modify the tf.py_func function to make it define the gradient at the same time:

import tensorflow as tf

def py_func(func, inp, Tout, stateful=True, name=None, grad=None):

    # Need to generate a unique name to avoid duplicates:
    rnd_name = 'PyFuncGrad' + str(np.random.randint(0, 1E+8))

    tf.RegisterGradient(rnd_name)(grad)  # see _MySquareGrad for grad example
    g = tf.get_default_graph()
    with g.gradient_override_map({"PyFunc": rnd_name}):
        return tf.py_func(func, inp, Tout, stateful=stateful, name=name)

The stateful option is to tell tensorflow whether the function always gives the same output for the same input (stateful = False) in which case tensorflow can simply the tensorflow graph, this is our case and will probably be the case in most situations.

Combining it all together: Now that we have all the pieces, we can combine them all together:

from tensorflow.python.framework import ops

def tf_mod(x,y, name=None):

    with ops.op_scope([x,y], name, "mod") as name:
        z = py_func(np_mod,
                        [x,y],
                        [tf.float32],
                        name=name,
                        grad=modgrad)  # <-- here's the call to the gradient
        return z[0]

tf.py_func acts on lists of tensors (and returns a list of tensors), that is why we have [x,y] (and return z[0]). And now we are done. And we can test it.

Test:

with tf.Session() as sess:

    x = tf.constant([0.3,0.7,1.2,1.7])
    y = tf.constant([0.2,0.5,1.0,2.9])
    z = tf_mod(x,y)
    gr = tf.gradients(z, [x,y])
    tf.initialize_all_variables().run()

    print(x.eval(), y.eval(),z.eval(), gr[0].eval(), gr[1].eval())

[ 0.30000001 0.69999999 1.20000005 1.70000005] [ 0.2 0.5 1. 2.9000001] [ 0.10000001 0.19999999 0.20000005 1.70000005] [ 1. 1. 1. 1.] [ -1. -1. -1. 0.]

Success!

这篇关于Tensorflow:如何在 python 中用梯度编写操作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆