将Scipy Optimizer与Tensorflow 2.0一起用于神经网络训练 [英] Use Scipy Optimizer with Tensorflow 2.0 for Neural Network training

查看:439
本文介绍了将Scipy Optimizer与Tensorflow 2.0一起用于神经网络训练的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在引入Tensorflow 2.0之后,已删除scipy接口(tf.contrib.opt.ScipyOptimizerInterface).但是,我仍然想使用scipy优化器 scipy.optimize.minimize(method ='L-BFGS-B')来训练神经网络( keras模型顺序).为了使优化器正常工作,它需要输入 fun(x0)作为函数,其中 x0 是形状(n,)的数组.因此,第一步将是加权"权重矩阵以获得具有所需形状的向量.为此,我修改了 https://pychao.com/2019/11/02/optimize-tensorflow-keras-models-with-l-bfgs-from-tensorflow-probability/.这提供了一个函数工厂,用于创建这样的函数 fun(x0).但是,该代码似乎无法正常工作,并且损失函数不会减少.如果有人可以帮助我解决这个问题,我将非常感激.

After the introduction of Tensorflow 2.0 the scipy interface (tf.contrib.opt.ScipyOptimizerInterface) has been removed. However, I would still like to use the scipy optimizer scipy.optimize.minimize(method=’L-BFGS-B’) to train a neural network (keras model sequential). In order for the optimizer to work, it requires as input a function fun(x0) with x0 being an array of shape (n,). Therefore, the first step would be to "flatten" the weights matrices to obtain a vector with the required shape. To this end, I modified the code provided by https://pychao.com/2019/11/02/optimize-tensorflow-keras-models-with-l-bfgs-from-tensorflow-probability/. This provides a function factory meant to create such a function fun(x0). However, the code does not seem to work and the loss function does not decrease. I would be really grateful if someone could help me work this out.

这是我正在使用的一段代码:

Here the piece of code I am using:

func = function_factory(model, loss_function, x_u_train, u_train)

# convert initial model parameters to a 1D tf.Tensor
init_params = tf.dynamic_stitch(func.idx, model.trainable_variables)
init_params = tf.cast(init_params, dtype=tf.float32)

# train the model with L-BFGS solver
results = scipy.optimize.minimize(fun=func, x0=init_params, method='L-BFGS-B')


def loss_function(x_u_train, u_train, network):
    u_pred = tf.cast(network(x_u_train), dtype=tf.float32)
    loss_value = tf.reduce_mean(tf.square(u_train - u_pred))
    return tf.cast(loss_value, dtype=tf.float32)


def function_factory(model, loss_f, x_u_train, u_train):
    """A factory to create a function required by tfp.optimizer.lbfgs_minimize.

    Args:
        model [in]: an instance of `tf.keras.Model` or its subclasses.
        loss [in]: a function with signature loss_value = loss(pred_y, true_y).
        train_x [in]: the input part of training data.
        train_y [in]: the output part of training data.

    Returns:
        A function that has a signature of:
            loss_value, gradients = f(model_parameters).
    """

    # obtain the shapes of all trainable parameters in the model
    shapes = tf.shape_n(model.trainable_variables)
    n_tensors = len(shapes)

    # we'll use tf.dynamic_stitch and tf.dynamic_partition later, so we need to
    # prepare required information first
    count = 0
    idx = [] # stitch indices
    part = [] # partition indices

    for i, shape in enumerate(shapes):
        n = np.product(shape)
        idx.append(tf.reshape(tf.range(count, count+n, dtype=tf.int32), shape))
        part.extend([i]*n)
        count += n

    part = tf.constant(part)


    def assign_new_model_parameters(params_1d):
        """A function updating the model's parameters with a 1D tf.Tensor.

        Args:
            params_1d [in]: a 1D tf.Tensor representing the model's trainable parameters.
        """

        params = tf.dynamic_partition(params_1d, part, n_tensors)
        for i, (shape, param) in enumerate(zip(shapes, params)):

            model.trainable_variables[i].assign(tf.cast(tf.reshape(param, shape), dtype=tf.float32))

    # now create a function that will be returned by this factory

    def f(params_1d):
        """
        This function is created by function_factory.
        Args:
            params_1d [in]: a 1D tf.Tensor.

        Returns:
            A scalar loss.
        """

        # update the parameters in the model
        assign_new_model_parameters(params_1d)
        # calculate the loss
        loss_value = loss_f(x_u_train, u_train, model)

        # print out iteration & loss
        f.iter.assign_add(1)
        tf.print("Iter:", f.iter, "loss:", loss_value)

        return loss_value

    # store these information as members so we can use them outside the scope
    f.iter = tf.Variable(0)
    f.idx = idx
    f.part = part
    f.shapes = shapes
    f.assign_new_model_parameters = assign_new_model_parameters

    return f

模型是对象tf.keras.Sequential.

Here model is an object tf.keras.Sequential.

在此先感谢您的帮助!

推荐答案

从tf1切换到tf2,我遇到了同样的问题,经过一番实验后,我发现下面的解决方案说明了如何在用tf.function和scipy优化器装饰的函数.与该问题相比,重要的变化是:

Changing from tf1 to tf2 I was exposed to the same question and after a little bit of experimenting I found the solution below that shows how to establish the interface between a function decorated with tf.function and a scipy optimizer. The important changes compared to the question are:

  1. 如Ives scipy的lbfgs所述需要获取函数值和渐变,因此您需要提供一个同时提供两者的函数,然后设置 jac = True
  2. scipy的lbfgs是一个Fortran函数,它期望接口提供np.float64数组,而tensorflow tf.function使用tf.float32.因此,必须转换输入和输出.

在下面的示例中,我将提供一个示例说明如何解决玩具问题.

I provide an example of how this can be done for a toy problem here below.

import tensorflow as tf
import numpy as np
import scipy.optimize as sopt

def model(x):
    return tf.reduce_sum(tf.square(x-tf.constant(2, dtype=tf.float32)))

@tf.function
def val_and_grad(x):
    with tf.GradientTape() as tape:
        tape.watch(x)
        loss = model(x)
    grad = tape.gradient(loss, x)
    return loss, grad

def func(x):
    return [vv.numpy().astype(np.float64)  for vv in val_and_grad(tf.constant(x, dtype=tf.float32))]

resdd= sopt.minimize(fun=func, x0=np.ones(5),
                                      jac=True, method='L-BFGS-B')

print("info:\n",resdd)

显示

info:
       fun: 7.105427357601002e-14
 hess_inv: <5x5 LbfgsInvHessProduct with dtype=float64>
      jac: array([-2.38418579e-07, -2.38418579e-07, -2.38418579e-07, -2.38418579e-07,
       -2.38418579e-07])
  message: b'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL'
     nfev: 3
      nit: 2
   status: 0
  success: True
        x: array([1.99999988, 1.99999988, 1.99999988, 1.99999988, 1.99999988])

基准

用于比较速度我用lbfgs优化器,用于样式转换问题(有关网络,请参见此处).请注意,对于此问题,网络参数是固定的,输入信号是自适应的.由于优化的参数(输入信号)是一维的,因此不需要功能工厂.

Benchmark

For comparing speed I use the lbfgs optimizer for a style transfer problem (see here for the network). Note, that for this problem the network parameters are fixed and the input signal is adapted. As the optimized parameters (the input signal) are 1D the function factory is not needed.

我比较了四种实现方式

  1. TF1.12:带有ScipyOptimizerInterface的TF1
  2. TF2.0(E):上面的方法不使用tf.function装饰器
  3. TF2.0(G):以上使用tf.function装饰器的方法
  4. TF2.0/TFP:使用来自的lbfgs最小化器 tensorflow_probability

为进行比较,优化在300次迭代后停止(通常为收敛,问题需要进行3000次迭代)

For this comparison the optimization is stopped after 300 iterations (generally for convergence the problem requires 3000 iterations)

Method       runtime(300it)      final loss         
TF1.12          240s                0.045     (baseline)
TF2.0 (E)       299s                0.045
TF2.0 (G)       233s                0.045
TF2.0/TFP       226s                0.053

TF2.0紧急模式(TF2.0(E))可以正常工作,但比TF1.12基线版本慢20%.具有tf功能的TF2.0(G)可以正常工作,并且比TF1.12快一点,这是一个好消息.

The TF2.0 eager mode (TF2.0(E)) works correctly but is about 20% slower than the TF1.12 baseline version. TF2.0(G) with tf.function works fine and is marginally faster than TF1.12, which is a good thing to know.

使用scipy的lbfgs,来自tensorflow_probability(TF2.0/TFP)的优化器比TF2.0(G)略快,但未实现相同的错误减少.实际上,随着时间的流逝,损失的减少并不是单调的,这似乎是一个不好的信号.比较lbfgs的两种实现(scipy和tensorflow_probability = TFP),很明显,scipy中的Fortran代码要复杂得多.因此,要么简化TFP中的算法,要么损害TFP在float32中执行所有计算的事实,就可能成为问题.

The optimizer from tensorflow_probability (TF2.0/TFP) is slightly faster than TF2.0(G) using scipy's lbfgs but does not achieve the same error reduction. In fact the decrease of the loss over time is not monotonous which seems a bad sign. Comparing the two implementations of lbfgs (scipy and tensorflow_probability=TFP) it is clear that the Fortran code in scipy is significantly more complex. So either the simplification of the algorithm in TFP is harming here or even the fact that TFP is performing all calculations in float32 may also be a problem.

这篇关于将Scipy Optimizer与Tensorflow 2.0一起用于神经网络训练的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆