凯拉斯的二阶导数 [英] Second derivative in Keras

查看:77
本文介绍了凯拉斯的二阶导数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于NN的自定义损失,我使用函数.给定(t,x)对,在间隔中的两个点都 u 是我的NN的输出.问题是我被困在如何使用K.gradient(K是TensorFlow后端)计算二阶导数:

For a custom loss for a NN I use the function . u, given a pair (t,x), both points in an interval, is the the output of my NN. Problem is I'm stuck at how to compute the second derivative using K.gradient (K being the TensorFlow backend):

def custom_loss(input_tensor, output_tensor):
    def loss(y_true, y_pred):

        # so far, I can only get this right, naturally:            
        gradient = K.gradients(output_tensor, input_tensor)

        # here I'm falling badly:

        # d_t = K.gradients(output_tensor, input_tensor)[0]
        # dd_x = K.gradient(K.gradients(output_tensor, input_tensor),
        #                   input_tensor[1])

        return gradient # obviously not useful, just for it to work
    return loss  

基于Input(shape=(2,)),我所有的尝试都是上面代码段中注释行的变体,主要是试图找到所得张量的正确索引.

All my attemps, based on Input(shape=(2,)), were variations of the commented lines in the snippet above, mainly trying to find the right indexation of the resulting tensor.

当然,我缺乏关于张量如何精确工作的知识.顺便说一下,我知道在TensorFlow本身中我可以简单地使用tf.hessian,但是我注意到当使用TF作为后端时,它只是不存在.

Sure enough I lack knowledge of how exactly tensors work. By the way, I know in TensorFlow itself I could simply use tf.hessian, but I noticed it's just not present when using TF as a backend.

推荐答案

为了 K.gradients() 这样的图层,您必须将其封装在 Lambda() 图层中,因为否则,将不会创建完整的Keras图层,并且您无法对其进行链接或训练.因此,此代码可以正常工作(经过测试):

In order for a K.gradients() layer to work like that, you have to enclose it in a Lambda() layer, because otherwise a full Keras layer is not created, and you can't chain it or train through it. So this code will work (tested):

import keras
from keras.models import *
from keras.layers import *
from keras import backend as K
import tensorflow as tf

def grad( y, x ):
    return Lambda( lambda z: K.gradients( z[ 0 ], z[ 1 ] ), output_shape = [1] )( [ y, x ] )

def network( i, d ):
    m = Add()( [ i, d ] )
    a = Lambda(lambda x: K.log( x ) )( m )
    return a

fixed_input = Input(tensor=tf.constant( [ 1.0 ] ) )
double = Input(tensor=tf.constant( [ 2.0 ] ) )

a = network( fixed_input, double )

b = grad( a, fixed_input )
c = grad( b, fixed_input )
d = grad( c, fixed_input )
e = grad( d, fixed_input )

model = Model( inputs = [ fixed_input, double ], outputs = [ a, b, c, d, e ] )

print( model.predict( x=None, steps = 1 ) )

def network模型 f(x)= log(x + 2) ,位于 x = 1 . def grad是完成梯度计算的位置.这段代码输出:

def network models f( x ) = log( x + 2 ) at x = 1. def grad is where the gradient calculation is done. This code outputs:

[array([1.0986123],dtype = float32),array([0.33333334],dtype = float32),array([-0.11111112],dtype = float32),array([0.07407408],dtype = float32),数组([-0.07407409],dtype = float32)]

[array([1.0986123], dtype=float32), array([0.33333334], dtype=float32), array([-0.11111112], dtype=float32), array([0.07407408], dtype=float32), array([-0.07407409], dtype=float32)]

log(3) -1/3 2 2/3 3 -6/3 4 .

作为参考,普通TensorFlow中的相同代码(用于测试):

For reference, the same code in plain TensorFlow (used for testing):

import tensorflow as tf

a = tf.constant( 1.0 )
a2 = tf.constant( 2.0 )

b = tf.log( a + a2 )
c = tf.gradients( b, a )
d = tf.gradients( c, a )
e = tf.gradients( d, a )
f = tf.gradients( e, a )

with tf.Session() as sess:
    print( sess.run( [ b, c, d, e, f ] ) )

输出相同的值:

[1.0986123,[0.33333334],[-0.11111112],[0.07407408],[-0.07407409]]

[1.0986123, [0.33333334], [-0.11111112], [0.07407408], [-0.07407409]]

黑森州

tf.hessians() 确实返回了第二个导数,这是链接的简写两个 tf.gradients() . Keras后端虽然没有hessians,所以您必须将两个 K.gradients() .

Hessians

tf.hessians() does return the second derivative, that's a shorthand for chaining two tf.gradients(). The Keras backend doesn't have hessians though, so you do have to chain the two K.gradients().

如果由于某种原因上述方法均无效,则您可能需要考虑在较小的 ε 距离上采用差值,以数值近似二阶导数.这基本上使每个输入的网络 增长了三倍,因此,该解决方案除了缺乏准确性外,还引入了严重的效率考虑.无论如何,代码(经过测试):

If for some reason none of the above works, then you might want to consider numerically approximating the second derivative with taking the difference over a small ε distance. This basically triples the network for each input, so this solution introduces serious efficiency considerations, besides lacking in accuracy. Anyway, the code (tested):

import keras
from keras.models import *
from keras.layers import *
from keras import backend as K
import tensorflow as tf

def network( i, d ):
    m = Add()( [ i, d ] )
    a = Lambda(lambda x: K.log( x ) )( m )
    return a

fixed_input = Input(tensor=tf.constant( [ 1.0 ], dtype = tf.float64 ) )
double = Input(tensor=tf.constant( [ 2.0 ], dtype = tf.float64 ) )

epsilon = Input( tensor = tf.constant( [ 1e-7 ], dtype = tf.float64 ) )
eps_reciproc = Input( tensor = tf.constant( [ 1e+7 ], dtype = tf.float64 ) )

a0 = network( Subtract()( [ fixed_input, epsilon ] ), double )
a1 = network(               fixed_input,              double )
a2 = network(      Add()( [ fixed_input, epsilon ] ), double )

d0 = Subtract()( [ a1, a0 ] )
d1 = Subtract()( [ a2, a1 ] )

dv0 = Multiply()( [ d0, eps_reciproc ] )
dv1 = Multiply()( [ d1, eps_reciproc ] )

dd0 = Multiply()( [ Subtract()( [ dv1, dv0 ] ), eps_reciproc ] )

model = Model( inputs = [ fixed_input, double, epsilon, eps_reciproc ], outputs = [ a0, dv0, dd0 ] )

print( model.predict( x=None, steps = 1 ) )

输出:

[array([1.09861226]),array([0.33333334]),array([-0.1110223])]

[array([1.09861226]), array([0.33333334]), array([-0.1110223])]

(这仅涉及二阶导数.)

(This only gets to the second derivative.)

这篇关于凯拉斯的二阶导数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆