如何使用累积的梯度更新模型参数? [英] How to update model parameters with accumulated gradients?

查看:163
本文介绍了如何使用累积的梯度更新模型参数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用TensorFlow构建深度学习模型.而且是TensorFlow的新功能.

I'm using TensorFlow to build a deep learning model. And new to TensorFlow.

由于某种原因,我的模型的批量大小有限,那么这种有限的批量大小会使模型具有较大的差异.

Due to some reason, my model has limited batch size, then this limited batch-size will make the model has a high variance.

因此,我想使用一些技巧来使批处理大小更大.我的想法是存储每个微型批次(例如64个微型批次)的梯度,然后将这些梯度求和,使用这64个微型批次的训练数据的平均梯度来更新模型的参数.

So, I want to use some trick to make the batch size larger. My idea is to store the gradients of each mini-batch, for example 64 mini-batches, and then sum the gradients together, use the mean gradients of this 64 mini batches of training data to update the model's parameters.

这意味着对于前63个迷你批次,请勿更新参数,而在64个迷你批次之后,仅更新模型参数一次.

This means that for the first 63 mini-batches, do not update the parameters, and after the 64 mini batch, update the model's parameters only once.

但是,由于TensorFlow是基于图的,所以有人知道如何实现此所需功能吗?

But as TensorFlow is graph based, do anyone know how to implement this wanted feature?

非常感谢.

推荐答案

我在这里找到了解决方案: https://github.com/tensorflow/tensorflow/issues/3994#event-766328647

I found a solution here: https://github.com/tensorflow/tensorflow/issues/3994#event-766328647

opt = tf.train.AdamOptimizer()
tvs = tf.trainable_variables()
accum_vars = [tf.Variable(tf.zeros_like(tv.initialized_value()), trainable=False) for tv in tvs]                                        
zero_ops = [tv.assign(tf.zeros_like(tv)) for tv in accum_vars]
gvs = opt.compute_gradients(rmse, tvs)
accum_ops = [accum_vars[i].assign_add(gv[0]) for i, gv in enumerate(gvs)]
train_step = opt.apply_gradients([(accum_vars[i], gv[1]) for i, gv in enumerate(gvs)])

在训练循环中:

while True:
    sess.run(zero_ops)
    for i in xrange(n_minibatches):
        sess.run(accum_ops, feed_dict=dict(X: Xs[i], y: ys[i]))
    sess.run(train_step)

但是这段代码看起来不是很干净漂亮,有人知道如何优化这些代码吗?

But this code seems not very clean and pretty, does anyone know how to optimize these code?

这篇关于如何使用累积的梯度更新模型参数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆