tensorflow中模型并行的实现 [英] Implementation of model parallelism in tensorflow
问题描述
我是 tensorflow 的初学者.我目前正在开发一个具有 2 个 GPU,每个 12GB 的系统.我想在两个 GPU 之间实现模型并行以训练大型模型.我一直在浏览互联网、SO、tensorflow 文档等,我能够找到模型并行性及其结果的解释,但没有找到关于如何使用 tensorflow 实现它的小教程或小代码片段.我的意思是我们必须在每一层之后交换激活,对吗?那么我们该怎么做呢?是否有在张量流中实现模型并行的特定或更清晰的方法?如果您能向我推荐一个我可以学习实现它的地方或一个简单的代码,例如使用模型并行"在多个 GPU 上进行 mnist 训练,那将非常有帮助.
I am a beginner to tensorflow. I'm currently working on a system with 2 GPUs each of 12GB. I want to implement model parallelism across the two GPUs to train large models. I have been looking through all over the internet, SO, tensorflow documentation, etc, i was able to find the explanations of model parallelism and its results but nowhere did i find a small tutorial or small code snippets on how to implement it using tensorflow. I mean we have to exchange activations after every layer right? So how do we do that? Is there a specific or cleaner ways of implementing model parallelism in tensorflow? It would be very helpful if you could suggest me a place where i can learn to implement it or a simple code like mnist training on multiple GPU using 'MODEL PARALLELISM'.
注意:我已经在 CIFAR10 - multi gpu 教程中完成了数据并行,但我还没有找到任何模型并行的实现.
Note: I have done data parallelism like in CIFAR10 - multi gpu tutorial but i haven't found any implementation of model parallelism.
推荐答案
这是一个例子.该模型有一部分在GPU0上,一部分在GPU1上,一部分在CPU上,所以这是3路模型并行.
Here's an example. The model has some parts on GPU0, some parts on GPU1 and some parts on CPU, so this is 3 way model parallelism.
with tf.device("/gpu:0"):
a = tf.Variable(tf.ones(()))
a = tf.square(a)
with tf.device("/gpu:1"):
b = tf.Variable(tf.ones(()))
b = tf.square(b)
with tf.device("/cpu:0"):
loss = a+b
opt = tf.train.GradientDescentOptimizer(learning_rate=0.1)
train_op = opt.minimize(loss)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
for i in range(10):
loss0, _ = sess.run([loss, train_op])
print("loss", loss0)
这篇关于tensorflow中模型并行的实现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!