如何在 Tensorflow 中仅使用 Python 制作自定义激活函数? [英] How to make a custom activation function with only Python in Tensorflow?

查看:50
本文介绍了如何在 Tensorflow 中仅使用 Python 制作自定义激活函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设您需要制作一个仅使用预定义的 tensorflow 构建块无法实现的激活函数,您能做什么?

所以在 Tensorflow 中可以制作自己的激活函数.但它相当复杂,你必须用C++编写并重新编译整个tensorflow

第一步是把它变成一个numpy函数,这很简单:

将 numpy 导入为 npnp_spiky = np.vectorize(spiky)

现在我们应该写出它的导数.

激活梯度:在我们的例子中很容易,如果 x mod 1 < 则为 1.0.5 和 0 否则.所以:

def d_spiky(x):r = x % 1如果 r <= 0.5:返回 1别的:返回 0np_d_spiky = np.vectorize(d_spiky)

现在是利用它制作 TensorFlow 函数的困难部分.

将 numpy fct 转换为 tensorflow fct:我们将首先将 np_d_spiky 变成一个张量流函数.tensorflow 中有一个函数 tf.py_func(func, inp, Tout, stateful=stateful, name=name) [doc] 将任何 numpy 函数转换为 tensorflow 函数,因此我们可以使用它:

 将 tensorflow 导入为 tf从 tensorflow.python.framework 导入操作np_d_spiky_32 = lambda x: np_d_spiky(x).astype(np.float32)def tf_d_spiky(x,name=None):使用 tf.name_scope(name, d_spiky", [x]) 作为名称:y = tf.py_func(np_d_spiky_32,[X],[tf.float32],姓名=姓名,有状态=假)返回 y[0]

tf.py_func 作用于张量列表(并返回张量列表),这就是为什么我们有 [x](并返回 y[0]).stateful 选项是告诉 tensorflow 函数是否总是为相同的输入提供相同的输出(stateful = False),在这种情况下,tensorflow 可以简单地绘制 tensorflow 图,这是我们的情况,可能是大多数情况下的情况.在这一点上要注意的一件事是 numpy 使用了 float64 但 tensorflow 使用了 float32 所以你需要在之前将你的函数转换为使用 float32您可以将其转换为 tensorflow 函数,否则 tensorflow 会抱怨.这就是为什么我们需要先制作np_d_spiky_32.

梯度怎么样?仅执行上述操作的问题在于,即使我们现在有 tf_d_spiky,它是 np_d_spiky,如果我们愿意,我们不能将其用作激活函数,因为 tensorflow 不知道如何计算该函数的梯度.

Hack to get Gradients: 正如上面提到的来源中所解释的,有一个使用 tf.RegisterGradient [doc]tf.Graph.gradient_override_map [文档].从 harpone 复制代码,我们可以将 tf.py_func 函数修改为让它同时定义渐变:

def py_func(func, inp, Tout, stateful=True, name=None, grad=None):# 需要生成唯一的名称以避免重复:rnd_name = 'PyFuncGrad' + str(np.random.randint(0, 1E+8))tf.RegisterGradient(rnd_name)(grad) # 参见 _MySquareGrad 以获得 grad 示例g = tf.get_default_graph()使用 g.gradient_override_map({PyFunc": rnd_name}):return tf.py_func(func, inp, Tout, stateful=stateful, name=name)

现在差不多大功告成了,唯一需要传递给上面py_func函数的grad函数需要采用特殊的形式.它需要接受一个操作,以及操作前的梯度和操作后的梯度向后传播.

梯度函数:所以对于我们的尖峰激活函数,我们会这样做:

def spikygrad(op, grad):x = op.inputs[0]n_gr = tf_d_spiky(x)返回 grad * n_gr

激活函数只有一个输入,这就是为什么x = op.inputs[0].如果操作有很多输入,我们需要返回一个元组,每个输入一个梯度.例如,如果操作是 ab 相对于 a 的梯度是 +1 而相对于 b 的梯度是-1 所以我们有 return +1*grad,-1*grad.请注意,我们需要返回输入的张量流函数,这就是为什么需要 tf_d_spikynp_d_spiky 不会起作用,因为它不能作用于张量流张量.或者,我们可以使用张量流函数编写导数:

def spikygrad2(op, grad):x = op.inputs[0]r = tf.mod(x,1)n_gr = tf.to_float(tf.less_equal(r, 0.5))返回 grad * n_gr

将它们组合在一起:现在我们有了所有的部分,我们可以将它们全部组合在一起:

np_spiky_32 = lambda x: np_spiky(x).astype(np.float32)def tf_spiky(x, name=None):使用 tf.name_scope(name, spiky", [x]) 作为名称:y = py_func(np_spiky_32,[X],[tf.float32],姓名=姓名,grad=spikygrad) # <-- 这是对渐变的调用返回 y[0]

现在我们完成了.我们可以测试它.

测试:

 使用 tf.Session() 作为 sess:x = tf.constant([0.2,0.7,1.2,1.7])y = tf_spiky(x)tf.initialize_all_variables().run()打印(x.eval(),y.eval(),tf.gradients(y,[x])[0].eval())

<块引用>

[ 0.2 0.69999999 1.20000005 1.70000005] [ 0.2 0. 0.20000005 0.] [ 1. 0. 1. 0.]

成功!

Suppose you need to make an activation function which is not possible using only pre-defined tensorflow building-blocks, what can you do?

So in Tensorflow it is possible to make your own activation function. But it is quite complicated, you have to write it in C++ and recompile the whole of tensorflow [1] [2].

Is there a simpler way?

解决方案

Yes There is!

Credit: It was hard to find the information and get it working but here is an example copying from the principles and code found here and here.

Requirements: Before we start, there are two requirement for this to be able to succeed. First you need to be able to write your activation as a function on numpy arrays. Second you have to be able to write the derivative of that function either as a function in Tensorflow (easier) or in the worst case scenario as a function on numpy arrays.

Writing Activation function:

So let's take for example this function which we would want to use an activation function:

def spiky(x):
    r = x % 1
    if r <= 0.5:
        return r
    else:
        return 0

Which look as follows:

The first step is making it into a numpy function, this is easy:

import numpy as np
np_spiky = np.vectorize(spiky)

Now we should write its derivative.

Gradient of Activation: In our case it is easy, it is 1 if x mod 1 < 0.5 and 0 otherwise. So:

def d_spiky(x):
    r = x % 1
    if r <= 0.5:
        return 1
    else:
        return 0
np_d_spiky = np.vectorize(d_spiky)

Now for the hard part of making a TensorFlow function out of it.

Making a numpy fct to a tensorflow fct: We will start by making np_d_spiky into a tensorflow function. There is a function in tensorflow tf.py_func(func, inp, Tout, stateful=stateful, name=name) [doc] which transforms any numpy function to a tensorflow function, so we can use it:

import tensorflow as tf
from tensorflow.python.framework import ops

np_d_spiky_32 = lambda x: np_d_spiky(x).astype(np.float32)


def tf_d_spiky(x,name=None):
    with tf.name_scope(name, "d_spiky", [x]) as name:
        y = tf.py_func(np_d_spiky_32,
                        [x],
                        [tf.float32],
                        name=name,
                        stateful=False)
        return y[0]

tf.py_func acts on lists of tensors (and returns a list of tensors), that is why we have [x] (and return y[0]). The stateful option is to tell tensorflow whether the function always gives the same output for the same input (stateful = False) in which case tensorflow can simply the tensorflow graph, this is our case and will probably be the case in most situations. One thing to be careful of at this point is that numpy used float64 but tensorflow uses float32 so you need to convert your function to use float32 before you can convert it to a tensorflow function otherwise tensorflow will complain. This is why we need to make np_d_spiky_32 first.

What about the Gradients? The problem with only doing the above is that even though we now have tf_d_spiky which is the tensorflow version of np_d_spiky, we couldn't use it as an activation function if we wanted to because tensorflow doesn't know how to calculate the gradients of that function.

Hack to get Gradients: As explained in the sources mentioned above, there is a hack to define gradients of a function using tf.RegisterGradient [doc] and tf.Graph.gradient_override_map [doc]. Copying the code from harpone we can modify the tf.py_func function to make it define the gradient at the same time:

def py_func(func, inp, Tout, stateful=True, name=None, grad=None):
    
    # Need to generate a unique name to avoid duplicates:
    rnd_name = 'PyFuncGrad' + str(np.random.randint(0, 1E+8))
    
    tf.RegisterGradient(rnd_name)(grad)  # see _MySquareGrad for grad example
    g = tf.get_default_graph()
    with g.gradient_override_map({"PyFunc": rnd_name}):
        return tf.py_func(func, inp, Tout, stateful=stateful, name=name)

Now we are almost done, the only thing is that the grad function we need to pass to the above py_func function needs to take a special form. It needs to take in an operation, and the previous gradients before the operation and propagate the gradients backward after the operation.

Gradient Function: So for our spiky activation function that is how we would do it:

def spikygrad(op, grad):
    x = op.inputs[0]

    n_gr = tf_d_spiky(x)
    return grad * n_gr  

The activation function has only one input, that is why x = op.inputs[0]. If the operation had many inputs, we would need to return a tuple, one gradient for each input. For example if the operation was a-bthe gradient with respect to a is +1 and with respect to b is -1 so we would have return +1*grad,-1*grad. Notice that we need to return tensorflow functions of the input, that is why need tf_d_spiky, np_d_spiky would not have worked because it cannot act on tensorflow tensors. Alternatively we could have written the derivative using tensorflow functions:

def spikygrad2(op, grad):
    x = op.inputs[0]
    r = tf.mod(x,1)
    n_gr = tf.to_float(tf.less_equal(r, 0.5))
    return grad * n_gr  

Combining it all together: Now that we have all the pieces, we can combine them all together:

np_spiky_32 = lambda x: np_spiky(x).astype(np.float32)

def tf_spiky(x, name=None):
    
    with tf.name_scope(name, "spiky", [x]) as name:
        y = py_func(np_spiky_32,
                        [x],
                        [tf.float32],
                        name=name,
                        grad=spikygrad)  # <-- here's the call to the gradient
        return y[0]

And now we are done. And we can test it.

Test:

with tf.Session() as sess:

    x = tf.constant([0.2,0.7,1.2,1.7])
    y = tf_spiky(x)
    tf.initialize_all_variables().run()
    
    print(x.eval(), y.eval(), tf.gradients(y, [x])[0].eval())

[ 0.2 0.69999999 1.20000005 1.70000005] [ 0.2 0. 0.20000005 0.] [ 1. 0. 1. 0.]

Success!

这篇关于如何在 Tensorflow 中仅使用 Python 制作自定义激活函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆