如何使用Tensorflow在单个GPU上训练批量较大的大型模型？ [英] How to train a big model with relatively large batch size on a single GPU using Tensorflow?

查看：212 发布时间：2020/10/19 22:32:27 tensorflow deep-learning

本文介绍了如何使用Tensorflow在单个GPU上训练批量较大的大型模型？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个非常大的模型，由于内存不足错误，无法在批量大小为64的单个GPU上进行训练。有人建议我使用较小的批量。但是，如果我减小批量大小，准确性会下降。解决方案之一是仅喂入当前批次的一半，存储梯度，然后喂入剩余的梯度。这可以通过使用 compute_gradients 和 apply_gradients 明确地完成。但这是相对不便的（如果存在简洁的实现就可以了）。因此，我想知道是否有更好的解决方案（或简洁的实现）。

I have a very big model which cannot be trained on a single GPU with batch size 64 due to out of memory error. Someone suggest that I use smaller batch size. However, if I decrease my batch size, the accuracy drops down. One of the solutions is just feeding half of the current batch, storing the gradients and then feeding the remaining. This can be done explicitly by using compute_gradients and apply_gradients. But it is relatively inconvenient (it is OK if a concise implementation exists). So I wonder if there is any nicer solutions (or concise implementation) to this problem.

预先感谢。

如何使用Tensorflow在单个GPU上训练批量较大的大型模型？ [英] How to train a big model with relatively large batch size on a single GPU using Tensorflow?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何使用Tensorflow在单个GPU上训练批量较大的大型模型？ [英] How to train a big model with relatively large batch size on a single GPU using Tensorflow?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭