在 DNN 训练结束时返回逆 Hessian 矩阵,并在输入中使用偏导数 [英] Return Inverse Hessian Matrix at the end of DNN Training and Partial Derivatives wrt the Inputs

查看:21
本文介绍了在 DNN 训练结束时返回逆 Hessian 矩阵,并在输入中使用偏导数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用 Keras 和 Tensorflow 作为后端,我构建了一个 DNN,它以恒星光谱作为输入(7213 个数据点)并输出三个恒星参数(温度、重力和金属丰度).网络在我的测试集上训练得很好,预测也很好,但为了让结果在科学上有用,我需要能够估计我的错误.这样做的第一步是获得逆 Hessian 矩阵,这似乎无法仅使用 Keras.因此,我试图用 scipy 创建一个解决方法,使用 scipy.optimize.minimize 和 BFGS、L-BFGS-B 或 Netwon-CG 作为方法.其中任何一个都将返回逆 Hessian 矩阵.

Using Keras and Tensorflow as the backend, I have built a DNN that takes stellar spectra as an input (7213 data points) and output three stellar parameters (Temperature, gravity, and metallicity). The network trains well and predicts well on my test sets, but in order for the results to be scientifically useful, I need to be able to estimate my errors. The first step in doing this is to obtain the inverse Hessian matrix, which doesn't seem to be possible using just Keras. Therefore I am attempting to create a workaround with scipy, using scipy.optimize.minimize with either BFGS, L-BFGS-B, or Netwon-CG as the method. Any of these will return the inverse Hessian matrix.

我们的想法是使用 Adam 优化器训练模型 100 个时期(或直到模型收敛),然后运行一次迭代或 BFGS 函数(或其他函数之一)以返回我的模型的 Hessian 矩阵.

The idea is to train the model using the Adam optimizer for 100 epochs (or until the model converges) and then run one single iteration or function of BFGS (or one of the others) to return the Hessian matrix of my model.

这是我的代码:

from scipy.optimize import minimize

import numpy as np

from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.optimizers import Adam


# Define vars
activation = 'relu'
init = 'he_normal'
beta_1 = 0.9
beta_2 = 0.999
epsilon = 1e-08

input_shape = (None,n)
n_hidden = [2048,1024,512,256,128,32]
output_dim = 3

epochs = 100
lr = 0.0008
batch_size = 64
decay = 0.00

# Design DNN Layers

model = Sequential([

    Dense(n_hidden[0], batch_input_shape=input_shape, init=init, activation=activation),

    Dense(n_hidden[1], init=init, activation=activation), 

    Dense(n_hidden[2], init=init, activation=activation),

    Dense(n_hidden[3], init=init, activation=activation),

    Dense(n_hidden[4], init=init, activation=activation),

    Dense(n_hidden[5], init=init, activation=activation),

    Dense(output_dim, init=init, activation='linear'),
])


# Optimization function
optimizer = Adam(lr=lr, beta_1=beta_1, beta_2=beta_2, epsilon=epsilon, decay=decay)


# Compile and train network
model.compile(optimizer=optimizer, loss='mean_squared_error')

#train_X.shape = (50000,7213)
#train_Y.shape = (50000,3)
#cv_X.shape = (10000,7213)
#cv_Y.shape = (10000,3)

history = model.fit(train_X, train_Y, validation_data=(cv_X, cv_Y),
             nb_epoch=epochs, batch_size=batch_size, verbose=2)


weights = []
for layer in model.layers:
    weights.append(layer.get_weights())

def loss(W):
    weightsList = W
    weightsList = np.array(W)
    new_weights = []
    for i, layer in enumerate((weightsList)):
        new_weights.append(np.array(weightsList[i]))
    model.set_weights(np.array(new_weights))
    preds = model.predict(train_X)
    mse = np.sum(np.square(np.subtract(preds,train_Y)))/len(train_X[:,0])
    print(mse)
    return mse


x0=weights    
res = minimize(loss, x0, args=(), method = 'BFGS', options={'maxiter':1,'eps':1e-6,'disp':True})
#res = minimize(loss, x0, method='L-BFGS-B', options={'disp': True, 'maxls': 1, 'gtol': 1e-05, 'eps': 1e-08, 'maxiter': 1, 'ftol': 0.5, 'maxcor': 1, 'maxfun': 1})
#res = minimize(loss, x0, args=(), method='Newton-CG', jac=None, hess=None, hessp=None, tol=None, callback=None, options={'disp': False, 'xtol': 1e-05, 'eps': 1.4901161193847656e-08, 'return_all': False, 'maxiter': 1})
inv_hess = res['hess_inv']

1) 模型训练得非常好,但是当尝试使用先前训练的权重运行 scipy 最小化器进行单次迭代时,我遇到了问题.

1) The model trains extremely well, but when attempting to run the scipy minimizer for a single iteration with the previously trained weights, I run into problems.

尝试 method=BFGS 时的输出:

Output when trying method=BFGS:

0.458706819754
0.457811632697
0.458706716791
...
0.350124572422
0.350186770445
0.350125320636

ValueErrorTraceback (most recent call last)
---> 19 res = minimize(loss, x0, args=(), method = 'BFGS', tol=1, options={'maxiter':1,'eps':1e-6,'disp':True})#,'gtol':0.1}, tol=5)

/opt/anaconda3/lib/python2.7/site-packages/scipy/optimize/_minimize.pyc in minimize(fun, x0, args, method, jac, hess, hessp, bounds, constraints, tol, callback, options)
    442         return _minimize_cg(fun, x0, args, jac, callback, **options)
    443     elif meth == 'bfgs':
--> 444         return _minimize_bfgs(fun, x0, args, jac, callback, **options)

/opt/anaconda3/lib/python2.7/site-packages/scipy/optimize/optimize.pyc in _minimize_bfgs(fun, x0, args, jac, callback, gtol, norm, eps, maxiter, disp, return_all, **unknown_options)
    963         try:  # this was handled in numeric, let it remaines for more safety
--> 964             rhok = 1.0 / (numpy.dot(yk, sk))
    965         except ZeroDivisionError:
    966             rhok = 1000.0

ValueError: operands could not be broadcast together with shapes (7213,2048) (2048,1024) 

尝试 method=L-BFGS-B 时的输出:

Output when trying method=L-BFGS-B:

ValueErrorTraceback (most recent call last)

---> 20 res = minimize(loss, x0, method='L-BFGS-B', options={'disp': True, 'maxls': 1, 'gtol': 1e-05, 'eps': 1e-08, 'maxiter': 1, 'ftol': 0.5, 'maxcor': 1, 'maxfun': 1})


/opt/anaconda3/lib/python2.7/site-packages/scipy/optimize/_minimize.pyc in minimize(fun, x0, args, method, jac, hess, hessp, bounds, constraints, tol, callback, options)
    448     elif meth == 'l-bfgs-b':
    449         return _minimize_lbfgsb(fun, x0, args, jac, bounds,
--> 450                                 callback=callback, **options)


/opt/anaconda3/lib/python2.7/site-packages/scipy/optimize/lbfgsb.pyc in _minimize_lbfgsb(fun, x0, args, jac, bounds, disp, maxcor, ftol, gtol, eps, maxfun, maxiter, iprint, callback, maxls, **unknown_options)
    300         raise ValueError('maxls must be positive.')
    301 
--> 302     x = array(x0, float64)
    303     f = array(0.0, float64)
    304     g = zeros((n,), float64)

ValueError: setting an array element with a sequence.

尝试 method=Newton-CG 时的输出

Output when trying method=Newton-CG

ValueErrorTraceback (most recent call last)

---> 21 res = minimize(loss, x0, args=(), method='Newton-CG', jac=None, hess=None, hessp=None, tol=None, callback=None, options={'disp': False, 'xtol': 1e-05, 'eps': 1.4901161193847656e-08, 'return_all': False, 'maxiter': 1})


/opt/anaconda3/lib/python2.7/site-packages/scipy/optimize/_minimize.pyc in minimize(fun, x0, args, method, jac, hess, hessp, bounds, constraints, tol, callback, options)
    445     elif meth == 'newton-cg':
    446         return _minimize_newtoncg(fun, x0, args, jac, hess, hessp, callback,
--> 447                                   **options)
    448     elif meth == 'l-bfgs-b':
    449         return _minimize_lbfgsb(fun, x0, args, jac, bounds,

/opt/anaconda3/lib/python2.7/site-packages/scipy/optimize/optimize.pyc in _minimize_newtoncg(fun, x0, args, jac, hess, hessp, callback, xtol, eps, maxiter, disp, return_all, **unknown_options)
   1438     _check_unknown_options(unknown_options)
   1439     if jac is None:
-> 1440         raise ValueError('Jacobian is required for Newton-CG method')

ValueError: Jacobian is required for Newton-CG method

2) 下一个任务是获取模型输出相对于模型输入的导数.例如,对于一个恒星参数(输出之一),比如温度,我需要找到关于 7213 个输入中的每一个的偏导数.然后对 3 个输出中的每一个都执行相同的操作.

2) The next task is to obtain the derivative of the model outputs with respect to the model inputs. For instance, for one stellar parameter (one of the outputs), say Temperature, I need to find the partial derivatives with respect to each of the 7213 inputs. And then do the same for each of the 3 outputs.

所以基本上,我的第一个任务 (1) 是找到一种方法来返回我的模型的逆 Hessian 矩阵,接下来 (2) 我需要找到一种方法来返回我的输出的一阶偏导数到我的输入.

So basically, my first task (1) is to find a way to return the inverse Hessian matrix of my model and next (2) I need to find a way to return the first-order partial derivatives of my outputs with respect to my inputs.

有人对这两项任务中的任何一项有所了解吗?谢谢.

Does anyone have some insight on either of these two tasks? Thanks.

编辑

我正在尝试使用 theano.gradient.jacobian() 返回输出 w.r.t. 的雅可比矩阵.我的输入.我已将我的模型转换为模型权重的函数,并将该函数用作 theano.gradient.jacobian() 中的第一个参数.当我尝试使用多维数组运行梯度时,我的问题出现了,我的模型权重和输入数据的形式为.

I am trying to use theano.gradient.jacobian() to return the Jacobian matrix of my output w.r.t. my inputs. I have turned my model into a function of the model weights and used that function as the first parameter in theano.gradient.jacobian(). My problem arises when I try and run the gradient with multidimensional arrays which my model weights and input data are in the form of.

import theano.tensor as T

weights_in_model = T.dvector('model_weights')
x = T.dvector('x')

def pred(x,weights_in_model):
    weights = T.stack((weights_in_model[0],weights_in_model[1]), axis=0)
    x = T.shape_padright(x, n_ones=1)

    prediction=T.dot(x, weights)
    prediction = T.clip(prediction, 0, 9999.)

    weights = T.stack((weights_in_model[2],weights_in_model[3]), axis=0)
    prediction = T.shape_padright(prediction, n_ones=1)
    prediction = T.dot(prediction, weights)
    prediction = T.clip(prediction, 0, 9999.)

    weights = T.stack((weights_in_model[4],weights_in_model[5]), axis=0)
    prediction = T.shape_padright(prediction, n_ones=1)
    prediction = T.dot(prediction, weights)
    prediction = T.clip(prediction, 0, 9999.)

    weights = T.stack((weights_in_model[6],weights_in_model[7]), axis=0)
    prediction = T.shape_padright(prediction, n_ones=1)
    prediction = T.dot(prediction, weights)
    prediction = T.clip(prediction, 0, 9999.)

    weights = T.stack((weights_in_model[8],weights_in_model[9]), axis=0)
    prediction = T.shape_padright(prediction, n_ones=1)
    prediction = T.dot(prediction, weights)
    prediction = T.clip(prediction, 0, 9999.)

    weights = T.stack((weights_in_model[10],weights_in_model[11]), axis=0)
    prediction = T.shape_padright(prediction, n_ones=1)
    prediction = T.dot(prediction, weights)
    prediction = T.clip(prediction, 0, 9999.)


    weights = T.stack((weights_in_model[12],weights_in_model[13]), axis=0)
    prediction = T.shape_padright(prediction, n_ones=1)
    prediction = T.dot(prediction, weights)
    T.flatten(prediction)

    return prediction


f=theano.gradient.jacobian(pred(x,weights_in_model),wrt=x)
h=theano.function([x,weights_in_model],f,allow_input_downcast=True)


x = train_X
weights_in_model = model.get_weights()
h(x,weights_in_model)

最后一行给出了错误:

TypeError: ('Bad input argument to theano function with name "<ipython-input-365-a1ab256aa220>:1"  at index 0(0-based)', 'Wrong number of dimensions: expected 1, got 2 with shape (2000, 7213).')

但是当我将输入更改为:

But when I change the inputs to:

weights_in_model = T.matrix('model_weights')
x = T.matrix('x')

我收到一行错误:

f=theano.gradient.jacobian(pred(x,weights_in_model),wrt=x)

阅读:

AssertionError: tensor.jacobian expects a 1 dimensional variable as `expression`. If not use flatten to make it a vector

关于如何解决这个问题有什么想法吗?

Any ideas on how to get around this?

推荐答案

找到答案!:此代码用于预测模型的一个输出值.目前我正在修改它以计算 3 个雅可比矩阵;每个输出一个.

ANSWER FOUND!: This code works for predicting one output value from the model. Currently I am working on modifying it to compute 3 jacobian matrices; one for each output.

import theano
import theano.tensor as T
import theano.typed_list
theano.config.optimizer='fast_compile'
theano.config.exception_verbosity='high'

# Declare function input placeholders
weights_in_model = theano.typed_list.TypedListType(theano.tensor.dmatrix)()
x = T.matrix('x')

# Define model function
def pred(x,weights_in_model): 
    weights = T.concatenate((weights_in_model[0],weights_in_model[1]), axis=0)
    x = T.concatenate((x, T.ones((T.shape(x)[0], 1))), axis=1)

    prediction = T.dot(x, weights)
    prediction = T.clip(prediction, 0, 9999.)

    weights = T.concatenate((weights_in_model[2],weights_in_model[3]), axis=0)
    prediction = T.concatenate((prediction, T.ones((T.shape(prediction)[0], 1))), axis=1)
    prediction = T.dot(prediction, weights)
    prediction = T.clip(prediction, 0, 9999.)

    weights = T.concatenate((weights_in_model[4],weights_in_model[5]), axis=0)
    prediction = T.concatenate((prediction, T.ones((T.shape(prediction)[0], 1))), axis=1)
    prediction = T.dot(prediction, weights)
    prediction = T.clip(prediction, 0, 9999.)

    weights = T.concatenate((weights_in_model[6],weights_in_model[7]), axis=0)
    prediction = T.concatenate((prediction, T.ones((T.shape(prediction)[0], 1))), axis=1)
    prediction = T.dot(prediction, weights)
    prediction = T.clip(prediction, 0, 9999.)

    weights = T.concatenate((weights_in_model[8],weights_in_model[9]), axis=0)
    prediction = T.concatenate((prediction, T.ones((T.shape(prediction)[0], 1))), axis=1)
    prediction = T.dot(prediction, weights)
    prediction = T.clip(prediction, 0, 9999.)

    weights = T.concatenate((weights_in_model[10],weights_in_model[11]), axis=0)
    prediction = T.concatenate((prediction, T.ones((T.shape(prediction)[0], 1))), axis=1)
    prediction = T.dot(prediction, weights)
    prediction = T.clip(prediction, 0, 9999.)


    weights = T.concatenate((weights_in_model[12],weights_in_model[13]), axis=0)
    prediction = T.concatenate((prediction, T.ones((T.shape(prediction)[0], 1))), axis=1)
    prediction = T.dot(prediction, weights)
    prediction = T.flatten(prediction)
    return prediction

# Create gradient function
f=theano.gradient.jacobian(pred(x,weights_in_model),wrt=x)

# Compile function
h=theano.function([x,weights_in_model],f,allow_input_downcast=True)


# Get function inputs
weights_in_model_ = model.get_weights()
x_=train_data

# Reshape bias layers
weights_in_model_[1] = np.reshape(weights_in_model_[1], (1, 2048))
weights_in_model_[3] = np.reshape(weights_in_model_[3], (1, 1024))
weights_in_model_[5] = np.reshape(weights_in_model_[5], (1, 512))
weights_in_model_[7] = np.reshape(weights_in_model_[7], (1, 256))
weights_in_model_[9] = np.reshape(weights_in_model_[9], (1, 128))
weights_in_model_[11] = np.reshape(weights_in_model_[11], (1, 32))
weights_in_model_[13] = np.reshape(weights_in_model_[13], (1, 1))

# Compute Jacobian (returns format with a bunch of zero rows)
jacs = h(x_, weights_in_model_)

# Put Jacobian matrix in proper format (ie. shape = (number_of_input_examples, number_of_input_features)

jacobian_matrix = np.zeros((jacs.shape[0],jacs.shape[2]))
for i, jac in enumerate(jacs): 
    jacobian_matrix[i] = jac[i]

下一个任务是找到输出 w.r.t. 的 Hessian 矩阵.模型权重!

Next task is to find the Hessian matrix of the outputs w.r.t. the model weights!

这篇关于在 DNN 训练结束时返回逆 Hessian 矩阵,并在输入中使用偏导数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆