tensorflow 多 GPU 并行使用 [英] tensorflow multi GPU parallel usage

查看:64
本文介绍了tensorflow 多 GPU 并行使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想并行使用 8 gpus,而不是按顺序使用.

I want to use 8 gpus on parallel, not sequencely.

例如,当我执行这段代码时,

For example, when I execute this code,

import tensorflow as tf

with tf.device('/gpu:0'):
    for i in range(10):
        print(i)

with tf.device('/gpu:1'):
    for i in range(10, 20):
        print(i)

我尝试了 cmd 命令 'CUDA_VISIBLE_DEVICE='0,1' 但结果是一样的.

I tried cmd command 'CUDA_VISIBLE_DEVICE='0,1' but result is same.

我想看到结果0 10 1 11 2 3 12 .... 等"

I want to see the result "0 10 1 11 2 3 12 .... etc"

但实际结果是依次0 1 2 3 4 5 ..... 10 11 12 13.."

But actual result is sequencely "0 1 2 3 4 5 ..... 10 11 12 13.."

我怎样才能得到想要的结果?

How can I get wanted result?

推荐答案

** 我看到对问题的编辑,因此将其添加到我的答案中**

** I see an edit with the question so adding this to my answer**

您需要将您的操作传递给 Tensorflow 会话,否则,代码将被解释为顺序的(就像许多编程语言一样),然后操作将按顺序完成.

You need to pass your operations to Tensorflow session, otherwise, code will be interpreted as sequential (as many programming language does), then operations will be completed sequential.

为了之前对问题的理解,下面讨论创建具有多个 gpu 的神经网络训练的讨论:

For the previous understanding of the question a discussion for creating a training of neural networks with multiple gpus discussed below:

坏消息是没有神奇的功能可以简单地为您做到这一点.

Bad news is there is not magic functionality that will simply do this for you.

好消息是有一些既定的方法.

Good news is there are a few established methods.

第一个是某些 CUDA 和其他 GPU 开发人员所熟悉的,可以将模型复制到多个 GPU,通过 CPU 进行同步.一种方法是分批拆分数据集,或者在这种情况下称为塔,然后为每个 GPU 提供一个塔.如果这是 MNIST 数据集,并且您有两个 GPU,则可以明确地使用 CPU 作为设备来启动此数据.现在,随着您的数据集变小,您的相对批次大小可能会更大.完成一个 epoch 后,您可以共享梯度并对其进行平均以训练两个网络.当然,使用 8 个 GPU 可以轻松扩展到您的情况.

First one is something familiar to some CUDA and maybe other GPU developers to replicate the model to multiple GPUs, synchronize through the CPU. One way to do this is to split your dataset in batches, or called towers in this case, then feed each GPU a tower. If this was MNIST dataset, and you had two GPUs, you could initiate initiate this data using CPU as device explicitly. Now, as your dataset got smaller, your relative batch size can be larger. Once you complete an epoch you can share the gradients and average to it train both networks. Of course, this easily scales to your case with 8 GPUs.

一个最小示例的任务分配和CPU收集结果下面:

A minimal example of a task distribution and collecting results on CPU can be seen below:

# Creates a graph.
c = []
for d in ['/gpu:2', '/gpu:3']:
  with tf.device(d):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2])
    c.append(tf.matmul(a, b))
with tf.device('/cpu:0'):
  sum = tf.add_n(c)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(sum))

但是,在许多设备之间传输数据会阻止您获得精确的 your_gpu_number 倍的加速.因此,您需要针对每个 GPU 优化您的工作负载,以最大限度地提高性能并尽量避免设备间通信.

However, transferring data between many devices, will prevent you from having exactly your_gpu_number times acceleration. Therefore, you need to optimize your workload for each GPU to maximize your performance and try to avoid inter-device communication as much as possible.

第二个是将您的神经网络拆分为您拥有的设备数量,训练并合并它们.

Second one is splitting your neural network into number of devices you have, train and merge them.

在多个 GPU 上显式运行模型需要您以这种方式设置算法.看看这些:

Running models explicitly on multiple GPUs will require you to set your algorithm in that fashion. Check these out:

https://www.tensorflow.org/guide/using_gpu#using_multiple_gpus

https://gist.github.com/j-min/69aae99be6f6acfadf2073817c2f>1

https://gist.github.com/j-min/69aae99be6f6acfadf2073817c2f61b0

这篇关于tensorflow 多 GPU 并行使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆