TensorFlow 默认使用机器中所有可用的 GPU 吗? [英] Does TensorFlow by default use all available GPUs in the machine?

查看:53
本文介绍了TensorFlow 默认使用机器中所有可用的 GPU 吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的机器中有 3 个 GTX Titan GPU.我使用 cifar10_train.py 运行 Cifar10 中提供的示例并得到以下输出:

I have 3 GTX Titan GPUs in my machine. I run the example provided in Cifar10 with cifar10_train.py and got the following output:

I tensorflow/core/common_runtime/gpu/gpu_init.cc:60] cannot enable peer access from device ordinal 0 to device ordinal 1
I tensorflow/core/common_runtime/gpu/gpu_init.cc:60] cannot enable peer access from device ordinal 1 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:127] DMA: 0 1 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:137] 0:   Y N 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:137] 1:   N Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:694] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN, pci bus id: 0000:03:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:694] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX TITAN, pci bus id: 0000:84:00.0)

在我看来,TensorFlow 正在尝试在两个设备(gpu0 和 gpu1)上进行自我初始化.

It looks to me that TensorFlow is trying to initialize itself on two devices (gpu0 and gpu1).

我的问题是为什么它只在两台设备上这样做,有什么办法可以防止这种情况发生吗?(我只希望它像只有一个 GPU 一样运行)

My question is why it only does that on two devices and is there any way to prevent that? (I only want it to run as if there is a single GPU)

推荐答案

参见:使用 GPU

手动放置设备

如果您希望特定操作在您选择的设备上运行,而不是自动为您选择的设备,您可以使用 tf.device 创建设备上下文,以便所有操作在该上下文中将具有相同的设备分配.

If you would like a particular operation to run on a device of your choice instead of what's automatically selected for you, you can use with tf.device to create a device context such that all the operations within that context will have the same device assignment.

# Creates a graph.
with tf.device('/cpu:0'):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))

您将看到现在 a 和 b 已分配给 cpu:0.由于没有为 MatMul 操作明确指定设备,TensorFlow 运行时将根据操作和可用设备(本例中为 gpu:0)选择一个设备,并在需要时自动在设备之间复制张量.

You will see that now a and b are assigned to cpu:0. Since a device was not explicitly specified for the MatMul operation, the TensorFlow runtime will choose one based on the operation and available devices (gpu:0 in this example) and automatically copy tensors between devices if required.

Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K40c, pci bus
id: 0000:05:00.0
b: /job:localhost/replica:0/task:0/cpu:0
a: /job:localhost/replica:0/task:0/cpu:0
MatMul: /job:localhost/replica:0/task:0/gpu:0
[[ 22.  28.]
 [ 49.  64.]]

<小时>

较早的答案 2.


Earlier Answer 2.

请参阅:使用 GPU

在多 GPU 系统上使用单个 GPU

如果您的系统中有多个 GPU,则默认选择 ID 最低的 GPU.如果您想在不同的 GPU 上运行,则需要明确指定首选项:

If you have more than one GPU in your system, the GPU with the lowest ID will be selected by default. If you would like to run on a different GPU, you will need to specify the preference explicitly:

# Creates a graph.
with tf.device('/gpu:2'):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
  c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print sess.run(c)

<小时>

较早的答案 1.


Earlier Answer 1.

来自 CUDA_VISIBLE_DEVICES – 屏蔽 GPU

您的 CUDA 应用程序是否需要针对特定​​的 GPU?如果你是编写启用 GPU 的代码时,您通常会使用设备查询来选择所需的 GPU.然而,一个快速简单的解决方案测试是使用环境变量CUDA_VISIBLE_DEVICES来限制您的 CUDA 应用程序看到的设备.这可以是如果您尝试在节点上共享资源或您想要您的 GPU 启用可执行文件以针对特定 GPU.

Does your CUDA application need to target a specific GPU? If you are writing GPU enabled code, you would typically use a device query to select the desired GPUs. However, a quick and easy solution for testing is to use the environment variable CUDA_VISIBLE_DEVICES to restrict the devices that your CUDA application sees. This can be useful if you are attempting to share resources on a node or you want your GPU enabled executable to target a specific GPU.

环境变量语法

结果

CUDA_VISIBLE_DEVICES=1 只会看到设备 1CUDA_VISIBLE_DEVICES=0,1 设备 0 和 1 将可见CUDA_VISIBLE_DEVICES="0,1" 同上,引号可选CUDA_VISIBLE_DEVICES=0,2,3 设备 0、2、3 将可见;设备 1被屏蔽了

CUDA_VISIBLE_DEVICES=1 Only device 1 will be seen CUDA_VISIBLE_DEVICES=0,1 Devices 0 and 1 will be visible CUDA_VISIBLE_DEVICES="0,1" Same as above, quotation marks are optional CUDA_VISIBLE_DEVICES=0,2,3 Devices 0, 2, 3 will be visible; device 1 is masked

CUDA 将从零开始枚举可见设备.在最后在这种情况下,设备 0、2、3 将显示为设备 0、1、2.如果您更改字符串的顺序为2,3,0",将枚举设备2,3,0分别为 0,1,2.如果 CUDA_VISIBLE_DEVICES 设置为一个设备不存在,所有设备都将被屏蔽.您可以指定混合有效和无效的设备号.无效值之前的所有设备将被枚举,而无效值之后的所有设备将被蒙面.

CUDA will enumerate the visible devices starting at zero. In the last case, devices 0, 2, 3 will appear as devices 0, 1, 2. If you change the order of the string to "2,3,0", devices 2,3,0 will be enumerated as 0,1,2 respectively. If CUDA_VISIBLE_DEVICES is set to a device that does not exist, all devices will be masked. You can specify a mix of valid and invalid device numbers. All devices before the invalid value will be enumerated, while all devices after the invalid value will be masked.

要确定系统中可用硬件的设备 ID,您可以运行 CUDA SDK 中包含的 NVIDIA 的 deviceQuery 可执行文件.编程愉快!

To determine the device ID for the available hardware in your system, you can run NVIDIA’s deviceQuery executable included in the CUDA SDK. Happy programming!

克里斯·梅森

这篇关于TensorFlow 默认使用机器中所有可用的 GPU 吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆