使用mxnet的简单梯度下降 [英] Simple gradient descent using mxnet

查看:110
本文介绍了使用mxnet的简单梯度下降的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用MXNet的梯度下降优化器来最小化功能. Tensorflow中的等效示例将是:

I'm trying to use MXNet's gradient descent optimizers to minimize a function. The equivalent example in Tensorflow would be:

import tensorflow as tf

x = tf.Variable(2, name='x', dtype=tf.float32)
log_x = tf.log(x)
log_x_squared = tf.square(log_x)

optimizer = tf.train.GradientDescentOptimizer(0.5)
train = optimizer.minimize(log_x_squared)

init = tf.initialize_all_variables()

def optimize():
  with tf.Session() as session:
    session.run(init)
    print("starting at", "x:", session.run(x), "log(x)^2:", session.run(log_x_squared))
    for step in range(10):  
      session.run(train)
      print("step", step, "x:", session.run(x), "log(x)^2:", session.run(log_x_squared))

我不确定如何在MXNet中完成相同的任务.优化程序API 文档似乎没有等效方法.到目前为止,这是我尝试过的.主要的困惑在于需要传递训练数据:

I am not sure how to accomplish the same in MXNet. The optimizer API documentation does not appear to have an equivalent method. Here's what I've tried so far. The main confusion has been around the need to pass training data:

import mxnet as mx

x = mx.sym.Variable('data')
log_x = mx.sym.log(x)
log_x_squared = mx.sym.square(log_x)

mod = mx.mod.Module(log_x_squared)  # Create a module where the loss function
                                    # is the one we want to optimize
mod.bind(data_shapes=[('data', (1,1))])  # ?? not sure if this is correct - we
                                         # are saying our input is a scalar
mod.init_params()
mod.init_optimizer()  # SGD is default

mod.fit()  # ?? must pass data_iter to fit

似乎应该以某种方式将x变量作为data_iter反馈回来,但我不知道该如何完成.

It seems like the x variable should be somehow fed back in as the data_iter but I don't know how to accomplish this.

更新:感谢 kevinthesun 的出色回答!这是在单个隐藏层神经网络之上构建的工作最小化例程:

Update: thanks to kevinthesun for their excellent answer! Here is a working minimization routine built on top of a single hidden-layer neural net:

import mxnet as mx
import numpy as np


def minimize(objective_function,
             initial_params,
             max_iters=1000,
             optimizer='sgd',
             optimizer_params=(('learning_rate', 0.1),),
             tol=1e-8):

    class InitialParam(mx.init.Initializer):

        def __init__(self, vals):
            super(InitialParam, self).__init__()
            self._vals = vals

        def _init_weight(self, _, arr):
            arr[:] = self._vals.asnumpy()[:, np.newaxis]


    x = mx.sym.Variable('data')
    params_len = initial_params.shape[0]
    fc = mx.sym.FullyConnected(data=x, name='fc1',
                               num_hidden=params_len,
                               no_bias=True)

    # Passing the FullyConnected layer into the objective function
    # is difficult to manipulate. If the fully connected layer represents
    # [x, y] for optimizing a 2 dimensional function f(x, y) it is easier
    # to work with x, and y. So we split the fully connected layer into a
    # number of symbols for each parameter:
    param_syms = []
    for i in range(params_len):
        ps = mx.sym.slice(fc, begin=(0, i), end=(1, i + 1))
        param_syms.append(ps)

    # The loss function for the network is our objective function.
    loss = mx.sym.MakeLoss(objective_function(param_syms))
    mod = mx.mod.Module(loss)

    mod.bind(data_shapes=[('data', (1,))])
    mod.init_params(InitialParam(initial_params))
    mod.init_optimizer(optimizer=optimizer,
                       optimizer_params=optimizer_params)

    (o_name, o_shape), = mod.output_shapes

    i = 0
    params = initial_params
    old_val = np.full(o_shape, np.nan)
    while i < max_iters:
        mod.forward_backward(mx.io.DataBatch(
            data=[mx.nd.ones((1,))])) 
        mod.update()
        params = mod.get_params()[0]['fc1_weight']
        val = mod.get_outputs()[0].asnumpy()
        if np.allclose(old_val, val, atol=tol):
            print 'Function value: {}'.format(val)
            print 'Iterations: {}'.format(i)
            return params

        old_val = val
        i += 1

    return params

并使用它:

def my_func(x):
    return (x[0] + 1) ** 2

p = minimize(my_func, mx.nd.array([1.0]))
p.asnumpy()

>>> array([[-0.99999988]], dtype=float32)

和另一个:

def my_func(x):
    return (x[0] + 1) ** 2 + (x[1] - 2) ** 2 + (x[2] + 3) ** 2

p = minimize(my_func, mx.nd.array([1.0, 1.5, 2.0]))
p.asnumpy()

>>> array([[-0.99996436],
           [ 1.99999106],
           [-2.99991083]], dtype=float32)

推荐答案

由于缺少前端的支持,使用MXNet优化简单函数目前不像张量流那样简单.

Currently it is not as easy as tensorflow to optimize a simple function using MXNet, due to the lack of support in frontend.

首先,您需要一个丢失功能作为网络的最后一层.在这里是log_x_squared.使用MakeLoss创建损失函数.

First you need a loss Function as the last layer of your network. Here it is log_x_squared. Use MakeLoss to create a loss function.

第二是输入和权重.由于当前MXNet变量中的变量不算作可训练的重量,因此您需要将x设置为重量.这是一种解决方法:设置"fake"输入变量,该变量始终为1.之后,添加具有1个隐藏单元且无偏差的完全连接的层.这给我们"1 * x".现在我们的x是一个权重.

Second is the input and weights. Since currently in MXNet Variable is not counted as trainable weight, you need to set x as weight. Here is a workaround: Set a 'fake' input variable which is always to be 1. After it add a fullyconnected layer with 1 hidden unit and no bias. This gives us "1 * x". Now our x is a weight.

第三,如果您想对单个数据样本进行多次优化,那么module.fit可能不是最佳选择.初始化优化器后.您只需要多次调用module.forward_backward()和module.update().对于forward_backward函数,您需要传递一个databatch,与dataiter相比,它是一个更简单的接口.在这里,我们只需要每次传递一个常数ndarray 1.

Third if you would like to optimize multiple times on single data sample, module.fit might not be the best choice. After initializing optimizer. You just need to call module.forward_backward() and module.update() multiple times. For forward_backward function you need to pass a databatch, which is a simpler interface comparing to dataiter. Here we just need to pass a constant ndarray of 1 every time.

实际上,我们构建了一个log(1 * x)^ 2的计算图,并且x变成了权重而不是变量.

Actually we construct a computation graph of log(1 * x) ^ 2 and x becomes a weight instead of variable.

无论如何,我们应该考虑提供一个类似的张量流接口来优化变量.

Anyway, we should consider providing a similar interface of tensorflow to optimize variable.

希望这是有用的信息!

这篇关于使用mxnet的简单梯度下降的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆