简单的 TensorFlow 示例将模型的一个副本加载到每个可用的 GPU 上 [英] Simple TensorFlow example loading one copy of a model onto each GPU that's available
问题描述
我正在考虑从不同的生产机器学习框架移植到 TensorFlow.在我们当前用于训练和推理的系统中,我们将模型的副本加载到与机器上一样多的 GPU 上.
I'm looking at porting from a different production machine learning framework to TensorFlow. In our current system for both training and inference we load copies of our model onto as many GPUs as are on the machine.
我想暂时保留这种负载平衡方式.在哪里可以找到将 TF 模型的一个副本加载到机器上可用的每个 GPU 上的简单示例?
I would like to keep this way of load-balancing for now. Where can I find a simple example of loading one copy of a TF model onto each GPU that's available on a machine?
推荐答案
以下是 https://github.com/rafaljozefowicz/lm/blob/master/language_model.py#L21
您将模型创建代码包装到 _forward
函数中,然后为每个 GPU 调用一次
You wrap your model creation code into _forward
function, and then call it once for each GPU
for i in range(hps.num_gpus):
with tf.device(assign_to_gpu(i, ps_device)), tf.variable_scope(tf.get_variable_scope(),
reuse=True if i > 0 else None):
loss = self._forward(i, xs[i], ys[i], ws[i])
losses += [loss]
if mode == "train":
cur_grads = self._backward(loss, summaries=(i == hps.num_gpus - 1))
tower_grads += [cur_grads]
self.loss = tf.add_n(losses) / len(losses)
这篇关于简单的 TensorFlow 示例将模型的一个副本加载到每个可用的 GPU 上的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!