这样的Tensorflow和Theano中的动量梯度更新有何不同? [英] What's different about momentum gradient update in Tensorflow and Theano like this?

查看:196
本文介绍了这样的Tensorflow和Theano中的动量梯度更新有何不同?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将TensorFlow用于我的深度学习项目.
在这里,我需要在此公式中实现渐变更新:

I'm trying to use TensorFlow with my deep learning project.
Here I need implement my gradient update in this formula :

我也在Theano中实现了这一部分,它给出了预期的答案.但是当我尝试使用TensorFlow的MomentumOptimizer时,结果确实很糟糕.我不知道他们之间有什么不同.

I have also implement this part in Theano, and it came out the expected answer. But when I try to use TensorFlow's MomentumOptimizer, the result is really bad. I don't know what is different between them.

Theano:

def gradient_updates_momentum_L2(cost, params, learning_rate, momentum, weight_cost_strength):
    # Make sure momentum is a sane value
    assert momentum < 1 and momentum >= 0
    # List of update steps for each parameter
    updates = []
    # Just gradient descent on cost
    for param in params:
        param_update = theano.shared(param.get_value()*0., broadcastable=param.broadcastable)
        updates.append((param, param - learning_rate*(param_update + weight_cost_strength * param_update)))
        updates.append((param_update, momentum*param_update + (1. - momentum)*T.grad(cost, param)))
    return updates

TensorFlow:

TensorFlow:

l2_loss = tf.add_n([tf.nn.l2_loss(v) for v in tf.trainable_variables()])
cost = cost + WEIGHT_COST_STRENGTH * l2_loss
train_op = tf.train.MomentumOptimizer(LEARNING_RATE, MOMENTUM).minimize(cost)

推荐答案

如果您在TensorFlow [

If you look at the implementation of momentum optimizer in TensorFlow [link], it is implemented as follows:

accum = accum * momentum() + grad;
var -= accum * lr();

如您所见,公式有点不同.按学习率缩放动量项可以解决您的差异.

As you see, the formulas are a bit different. Scaling momentum term by the learning rate should resolve your differences.

由您自己实施这种优化器也非常容易.所结果的 该代码看起来与您包含的Theano中的代码段相似.

It is also very easy to implement such optimizer by yourself. The resulting code would look similar to the snippet in Theano that you included.

这篇关于这样的Tensorflow和Theano中的动量梯度更新有何不同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆