如何使用Tensorflow在单个GPU上训练批量较大的大型模型? [英] How to train a big model with relatively large batch size on a single GPU using Tensorflow?

查看:212
本文介绍了如何使用Tensorflow在单个GPU上训练批量较大的大型模型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常大的模型,由于内存不足错误,无法在批量大小为64的单个GPU上进行训练。有人建议我使用较小的批量。但是,如果我减小批量大小,准确性会下降。解决方案之一是仅喂入当前批次的一半,存储梯度,然后喂入剩余的梯度。这可以通过使用 compute_gradients apply_gradients 明确地完成。但这是相对不便的(如果存在简洁的实现就可以了)。因此,我想知道是否有更好的解决方案(或简洁的实现)。

I have a very big model which cannot be trained on a single GPU with batch size 64 due to out of memory error. Someone suggest that I use smaller batch size. However, if I decrease my batch size, the accuracy drops down. One of the solutions is just feeding half of the current batch, storing the gradients and then feeding the remaining. This can be done explicitly by using compute_gradients and apply_gradients. But it is relatively inconvenient (it is OK if a concise implementation exists). So I wonder if there is any nicer solutions (or concise implementation) to this problem.

预先感谢。

推荐答案

您可以考虑研究以下内容: https:// github .com / openai / gradient-checkpointing

You may consider looking into this: https://github.com/openai/gradient-checkpointing.

最近,人们进行了很多研究,以提高反向传播的内存效率,但又增加了额外的前向传递。这是TensorFlow此类方案的最新实现。

There has been a lot of research lately on making backprop more memory efficient at the expense of additional forward passes. This is a very recent implementation of one such scheme for TensorFlow.

这篇关于如何使用Tensorflow在单个GPU上训练批量较大的大型模型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆