Keras中n维函数的导数 [英] Derivatives of n-dimensional function in Keras

查看:153
本文介绍了Keras中n维函数的导数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说我有一个双变量函数,例如:z = x ^ 2 + y ^ 2. 我了解到,在Keras上,我可以使用Lambda层计算n阶导数:

Say I have a bivariate function, for example: z = x^2 + y^2. I learned that on Keras I can compute nth-order derivatives using Lambda layers:

def bivariate_function(x, y):

    x2 = Lambda(lambda u: K.pow(u,2))(x)
    y3 = Lambda(lambda u: K.pow(u,2))(y)

    return Add()([x2,y3])

def derivative(y,x):
    return Lambda(lambda u: K.gradients(u[0],u[1]))([y,x])

f = bivariate_function(x,y)
df_dx = grad(f,x)      # 1st derivative wrt to x
df_dy = grad(f,y)      # 1st derivative wrt to y
df_dx2 = grad(df_dx,x) # 2nd derivative wrt to x
df_dy2 = grad(df_dy,y) # 2nd derivative wrt to y

但是,如何将这种方法应用于损失函数中输入的NN输出wrt的导数?我不能(?)只是简单地将两个输入送入一个密集层(如上面创建的输入).

However, how do I apply this approach to the derivatives of a NN output wrt to inputs in the loss function? I can't (?) just simply feed two inputs into a dense layer (as the ones created above).

例如,尝试使用Input(shape=(2,))将第一变量的一阶导数wrt和第二变量的二阶导数wrt之和(即d/dx +d²/dy²)用作损失,我设法达到了在这里:

For example, trying to use as loss the sum the first derivative wrt to the first variable and the second derivative wrt to the second variable (i.e. d/dx+d²/dy²), using Input(shape=(2,)), I managed to arrive here:

import tensorflow as tf
from keras.models import *
from keras.layers import *
from keras import backend as K

def grad(f, x):
    return Lambda(lambda u: K.gradients(u[0], u[1]), output_shape=[2])([f, x])

def custom_loss(input_tensor,output_tensor):
    def loss(y_true, y_pred):

        df1 = grad(output_tensor,input_tensor)
        df2 = grad(df1,input_tensor)
        df = tf.add(df1[0,0],df2[0,1])      

        return df
    return loss

input_tensor = Input(shape=(2,))
hidden_layer = Dense(100, activation='relu')(input_tensor)
output_tensor = Dense(1, activation='softplus')(hidden_layer)

model = Model(input_tensor, output_tensor)
model.compile(loss=custom_loss(input_tensor,output_tensor), optimizer='sgd')

xy = np.mgrid[-3.0:3.0:0.1, -3.0:3.0:0.1].reshape(2,-1).T
model.fit(x=xy,y=xy, batch_size=10, epochs=100, verbose=2)

但是感觉就像我没有以正确的方式来做.更糟糕的是,在第一个时期之后,我得到的只是nan.

But it just feels like I'm not doing it the proper way. Even worse, after the first epoch I'm getting just nan's.

推荐答案

此处的主要问题是理论上的.

The main issue here is theoretical.

您正在尝试最小化d output_tensor /d x + d 2 output_tensor /d 2 x .您的网络只是线性组合了输入 x -s,但是具有relusoftplus激活.好吧,softplus给它增加了一些扭曲,但是它也具有单调增加的导数.因此,为了使导数尽可能小,网络将使用负权重尽可能大地放大输入,以使导数尽可能小(即,很大的负数),并在某个时刻达到 NaN .我已经将第一层缩减为5个神经元,并运行了2个时期的模型,权重变为:

You're trying to minimize doutput_tensor/dx + d2output_tensor/d2x. Your network just linearly combines the input x-s, however, with relu and softplus activations. Well, softplus adds a bit of twist to it, but that also has a monotonously increasing derivative. Therefore for the derivative to be as small as possible, the network will just scale the input up as much as possible with negative weights, to make the derivative as small as possible (that is, a really large negative number), at some point reaching NaN. I've reduced the first layer to 5 neurons and ran the model for 2 epochs, and the weights became:

('dense_1',
[array([[1.0536456,-0.32706773,0.0072904,0.01986691,0.9854533],
[-0.3242108,-0.56753945、0.8098554,-0.7545874、0.2716419]],
dtype = float32),
数组([0.01207507,0.09927677,-0.01768671,-0.12874101,0.0210707], dtype = float32)])

('dense_1',
[array([[ 1.0536456 , -0.32706773, 0.0072904 , 0.01986691, 0.9854533 ],
[-0.3242108 , -0.56753945, 0.8098554 , -0.7545874 , 0.2716419 ]],
dtype=float32),
array([ 0.01207507, 0.09927677, -0.01768671, -0.12874101, 0.0210707 ], dtype=float32)])

('dense_2', [array([[-0.4332278],[0.6621602],[-0.07802075],[-0.5798264],[-0.40561703]],
dtype = float32),
数组([0.11167384],dtype = float32)]

('dense_2', [array([[-0.4332278 ], [ 0.6621602 ], [-0.07802075], [-0.5798264 ], [-0.40561703]],
dtype=float32),
array([0.11167384], dtype=float32)])

您可以看到第二层保持负号,而第一层具有正号,反之亦然. (偏差没有任何梯度,因为它们对导数没有贡献.嗯,由于softplus或多或少而并非完全正确.)

You can see that the second layer keeps a negative sign where the first has a positive, and vice versa. (Biases don't get any gradient because they don't contribute to the derivative. Well, not exactly true because of the softplus but more or less.)

因此,您必须提出一个不会向极端参数值发散的损失函数,因为这将是无法训练的,它只会增加权重的值,直到它们变为 NaN .

So you have to come up with a loss function that is not divergent towards extreme parameter values because this will not be trainable, it will just increase the values of weights until they NaN.

这是我运行的版本:

import tensorflow as tf
from keras.models import *
from keras.layers import *
from keras import backend as K

def grad(f, x):
    return Lambda(lambda u: K.gradients(u[0], u[1]), output_shape=[2])([f, x])

def ngrad(f, x, n):
    if 0 == n:
        return f
    else:
        return Lambda(lambda u: K.gradients(u[0], u[1]), output_shape=[2])([ngrad( f, x, n - 1 ), x])

def custom_loss(input_tensor,output_tensor):
    def loss(y_true, y_pred):

        _df1 = grad(output_tensor,input_tensor)
        df1 = tf.Print( _df1, [ _df1 ], message = "df1" )
        _df2 = grad(df1,input_tensor)
        df2 = tf.Print( _df2, [ _df2 ], message = "df2" )
        df = tf.add(df1,df2)      

        return df
    return loss

input_tensor = Input(shape=(2,))
hidden_layer = Dense(5, activation='softplus')(input_tensor)
output_tensor = Dense(1, activation='softplus')(hidden_layer)

model = Model(input_tensor, output_tensor)
model.compile(loss=custom_loss(input_tensor,output_tensor), optimizer='sgd')

xy = np.mgrid[-3.0:3.0:0.1, -3.0:3.0:0.1].reshape( 2, -1 ).T
#print( xy )
model.fit(x=xy,y=xy, batch_size=10, epochs=2, verbose=2)
for layer in model.layers: print(layer.get_config()['name'], layer.get_weights())

这篇关于Keras中n维函数的导数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆