如何使用累积梯度更新模型参数? [英] How to update model parameters with accumulated gradients?

查看:44
本文介绍了如何使用累积梯度更新模型参数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 TensorFlow 构建深度学习模型.并且是 TensorFlow 的新手.

I'm using TensorFlow to build a deep learning model. And new to TensorFlow.

由于某种原因,我的模型的batch size有限,那么这个有限的batch-size会使模型有很大的方差.

Due to some reason, my model has limited batch size, then this limited batch-size will make the model has a high variance.

所以,我想使用一些技巧来增大批量大小.我的想法是存储每个 mini-batch 的梯度,例如 64 个 mini-batch,然后将梯度相加,使用这 64 个 mini-batch 训练数据的平均梯度来更新模型的参数.

So, I want to use some trick to make the batch size larger. My idea is to store the gradients of each mini-batch, for example 64 mini-batches, and then sum the gradients together, use the mean gradients of this 64 mini batches of training data to update the model's parameters.

这意味着前63个mini-batch不更新参数,64个mini-batch后只更新一次模型参数.

This means that for the first 63 mini-batches, do not update the parameters, and after the 64 mini batch, update the model's parameters only once.

但由于 TensorFlow 是基于图的,有谁知道如何实现这个想要的功能?

But as TensorFlow is graph based, do anyone know how to implement this wanted feature?

非常感谢.

推荐答案

我在这里找到了解决方案:https://github.com/tensorflow/tensorflow/issues/3994#event-766328647

I found a solution here: https://github.com/tensorflow/tensorflow/issues/3994#event-766328647

opt = tf.train.AdamOptimizer()
tvs = tf.trainable_variables()
accum_vars = [tf.Variable(tf.zeros_like(tv.initialized_value()), trainable=False) for tv in tvs]                                        
zero_ops = [tv.assign(tf.zeros_like(tv)) for tv in accum_vars]
gvs = opt.compute_gradients(rmse, tvs)
accum_ops = [accum_vars[i].assign_add(gv[0]) for i, gv in enumerate(gvs)]
train_step = opt.apply_gradients([(accum_vars[i], gv[1]) for i, gv in enumerate(gvs)])

在训练循环中:

while True:
    sess.run(zero_ops)
    for i in xrange(n_minibatches):
        sess.run(accum_ops, feed_dict=dict(X: Xs[i], y: ys[i]))
    sess.run(train_step)

但是这段代码看起来不是很干净漂亮,有没有人知道如何优化这些代码?

But this code seems not very clean and pretty, does anyone know how to optimize these code?

这篇关于如何使用累积梯度更新模型参数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆