如何在具有数据并行性的多个GPU上运行Tensorflow Estimator [英] How to run Tensorflow Estimator on multiple GPUs with data parallelism
问题描述
我有一个带有某些模型的标准tensorflow Estimator,并且希望在多个GPU而不是一个GPU上运行它.如何使用数据并行性做到这一点?
I have a standard tensorflow Estimator with some model and want to run it on multiple GPUs instead of just one. How can this be done using data parallelism?
我搜索了Tensorflow文档,但没有找到示例;只有一句话说使用Estimator会很容易.
I searched the Tensorflow Docs but did not find an example; only sentences saying that it would be easy with Estimator.
有人使用tf.learn.Estimator有很好的例子吗?还是指向教程的链接?
Does anybody have a good example using the tf.learn.Estimator? Or a link to a tutorial or so?
推荐答案
我认为 tf.contrib.estimator.replicate_model_fn 文档,
I think tf.contrib.estimator.replicate_model_fn is a cleaner solution. The following is from tf.contrib.estimator.replicate_model_fn documentation,
...
def model_fn(...): # See `model_fn` in `Estimator`.
loss = ...
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
optimizer = tf.contrib.estimator.TowerOptimizer(optimizer)
if mode == tf.estimator.ModeKeys.TRAIN:
# See the section below on `EstimatorSpec.train_op`.
return EstimatorSpec(mode=mode, loss=loss,
train_op=optimizer.minimize(loss))
# No change for `ModeKeys.EVAL` or `ModeKeys.PREDICT`.
return EstimatorSpec(...)
...
classifier = tf.estimator.Estimator(
model_fn=tf.contrib.estimator.replicate_model_fn(model_fn))
您需要做的是用tf.contrib.estimator.TowerOptimize
和model_fn()
用tf.contrib.estimator.replicate_model_fn()
包装优化器.
我按照说明进行操作,并使TPU squeezenet模型在具有4个GPU的计算机上工作.我的修改此处.
What you need to do is to wrap optimizer with tf.contrib.estimator.TowerOptimize
and model_fn()
with tf.contrib.estimator.replicate_model_fn()
.
I followed the description and make an TPU squeezenet model work on a machine with 4 GPUs. My modifications here.
这篇关于如何在具有数据并行性的多个GPU上运行Tensorflow Estimator的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!