Keras中的二阶导数 [英] Second derivative in Keras

查看:31
本文介绍了Keras中的二阶导数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于 NN 的自定义损失,我使用函数 .u,给定一对 (t,x),区间内的两个点,是我的神经网络的输出.问题是我被困在如何使用 K.gradient 计算二阶导数(K 是 TensorFlow 后端):

For a custom loss for a NN I use the function . u, given a pair (t,x), both points in an interval, is the the output of my NN. Problem is I'm stuck at how to compute the second derivative using K.gradient (K being the TensorFlow backend):

def custom_loss(input_tensor, output_tensor):
    def loss(y_true, y_pred):

        # so far, I can only get this right, naturally:            
        gradient = K.gradients(output_tensor, input_tensor)

        # here I'm falling badly:

        # d_t = K.gradients(output_tensor, input_tensor)[0]
        # dd_x = K.gradient(K.gradients(output_tensor, input_tensor),
        #                   input_tensor[1])

        return gradient # obviously not useful, just for it to work
    return loss  

我所有基于 Input(shape=(2,)) 的尝试都是上面代码片段中注释行的变体,主要是试图找到结果张量的正确索引.

All my attemps, based on Input(shape=(2,)), were variations of the commented lines in the snippet above, mainly trying to find the right indexation of the resulting tensor.

果然我对张量的工作原理缺乏了解.顺便说一句,我知道在 TensorFlow 本身中我可以简单地使用 tf.hessian,但我注意到使用 TF 作为后端时它不存在.

Sure enough I lack knowledge of how exactly tensors work. By the way, I know in TensorFlow itself I could simply use tf.hessian, but I noticed it's just not present when using TF as a backend.

推荐答案

为了K.gradients() 层要像这样工作,你必须将它包含在 Lambda() 层,因为否则不会创建完整的 Keras 层,并且您无法链接或训练它.所以这段代码会起作用(经过测试):

In order for a K.gradients() layer to work like that, you have to enclose it in a Lambda() layer, because otherwise a full Keras layer is not created, and you can't chain it or train through it. So this code will work (tested):

import keras
from keras.models import *
from keras.layers import *
from keras import backend as K
import tensorflow as tf

def grad( y, x ):
    return Lambda( lambda z: K.gradients( z[ 0 ], z[ 1 ] ), output_shape = [1] )( [ y, x ] )

def network( i, d ):
    m = Add()( [ i, d ] )
    a = Lambda(lambda x: K.log( x ) )( m )
    return a

fixed_input = Input(tensor=tf.constant( [ 1.0 ] ) )
double = Input(tensor=tf.constant( [ 2.0 ] ) )

a = network( fixed_input, double )

b = grad( a, fixed_input )
c = grad( b, fixed_input )
d = grad( c, fixed_input )
e = grad( d, fixed_input )

model = Model( inputs = [ fixed_input, double ], outputs = [ a, b, c, d, e ] )

print( model.predict( x=None, steps = 1 ) )

def network 模型 f( x ) = log( x + 2 ) x = 1em>.def grad 是进行梯度计算的地方.此代码输出:

def network models f( x ) = log( x + 2 ) at x = 1. def grad is where the gradient calculation is done. This code outputs:

[array([1.0986123], dtype=float32), array([0.33333334], dtype=float32), array([-0.11111112], dtype=float32), array([0.07407408], dtype=float32)([-0.07407409], dtype=float32)]

[array([1.0986123], dtype=float32), array([0.33333334], dtype=float32), array([-0.11111112], dtype=float32), array([0.07407408], dtype=float32), array([-0.07407409], dtype=float32)]

哪些是 log( 3 ) 的正确值-1/32, 2/33, -6/34.

作为参考,纯TensorFlow中的相同代码(用于测试):

For reference, the same code in plain TensorFlow (used for testing):

import tensorflow as tf

a = tf.constant( 1.0 )
a2 = tf.constant( 2.0 )

b = tf.log( a + a2 )
c = tf.gradients( b, a )
d = tf.gradients( c, a )
e = tf.gradients( d, a )
f = tf.gradients( e, a )

with tf.Session() as sess:
    print( sess.run( [ b, c, d, e, f ] ) )

输出相同的值:

[1.0986123, [0.33333334], [-0.11111112], [0.07407408], [-0.07407409]]

[1.0986123, [0.33333334], [-0.11111112], [0.07407408], [-0.07407409]]

黑森州

tf.hessians() 确实返回二阶导数,这是链接两个 tf.gradients().不过,Keras 后端没有 hessians,因此您必须链接两个 <代码>K.gradients().

Hessians

tf.hessians() does return the second derivative, that's a shorthand for chaining two tf.gradients(). The Keras backend doesn't have hessians though, so you do have to chain the two K.gradients().

如果由于某种原因上述方法都不起作用,那么您可能需要考虑在数值上近似二阶导数,并在很小的ε 距离上取差值.这基本上将网络对于每个输入增加了三倍,因此除了缺乏准确性之外,该解决方案还引入了严重的效率考虑.无论如何,代码(已测试):

If for some reason none of the above works, then you might want to consider numerically approximating the second derivative with taking the difference over a small ε distance. This basically triples the network for each input, so this solution introduces serious efficiency considerations, besides lacking in accuracy. Anyway, the code (tested):

import keras
from keras.models import *
from keras.layers import *
from keras import backend as K
import tensorflow as tf

def network( i, d ):
    m = Add()( [ i, d ] )
    a = Lambda(lambda x: K.log( x ) )( m )
    return a

fixed_input = Input(tensor=tf.constant( [ 1.0 ], dtype = tf.float64 ) )
double = Input(tensor=tf.constant( [ 2.0 ], dtype = tf.float64 ) )

epsilon = Input( tensor = tf.constant( [ 1e-7 ], dtype = tf.float64 ) )
eps_reciproc = Input( tensor = tf.constant( [ 1e+7 ], dtype = tf.float64 ) )

a0 = network( Subtract()( [ fixed_input, epsilon ] ), double )
a1 = network(               fixed_input,              double )
a2 = network(      Add()( [ fixed_input, epsilon ] ), double )

d0 = Subtract()( [ a1, a0 ] )
d1 = Subtract()( [ a2, a1 ] )

dv0 = Multiply()( [ d0, eps_reciproc ] )
dv1 = Multiply()( [ d1, eps_reciproc ] )

dd0 = Multiply()( [ Subtract()( [ dv1, dv0 ] ), eps_reciproc ] )

model = Model( inputs = [ fixed_input, double, epsilon, eps_reciproc ], outputs = [ a0, dv0, dd0 ] )

print( model.predict( x=None, steps = 1 ) )

输出:

[数组([1.09861226])、数组([0.33333334])、数组([-0.1110223])]

[array([1.09861226]), array([0.33333334]), array([-0.1110223])]

(这只会得到二阶导数.)

(This only gets to the second derivative.)

这篇关于Keras中的二阶导数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆