为什么TensorFlow始终使用GPU 0? [英] Why does TensorFlow always use GPU 0?
问题描述
在多GPU设置上运行TensorFlow推理时遇到问题.
环境:Python 3.6.4; TensorFlow 1.8.0; Centos 7.3; 2英伟达Tesla P4
这是系统免费时的nvidia-smi输出:
Tue Aug 28 10:47:42 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.81 Driver Version: 384.81 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P4 Off | 00000000:00:0C.0 Off | 0 |
| N/A 38C P0 22W / 75W | 0MiB / 7606MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P4 Off | 00000000:00:0D.0 Off | 0 |
| N/A 39C P0 23W / 75W | 0MiB / 7606MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
与我的问题有关的主要声明:
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"
def get_sess_and_tensor(ckpt_path):
assert os.path.exists(ckpt_path), "file: {} not exist.".format(ckpt_path)
graph = tf.Graph()
with graph.as_default():
od_graph_def = tf.GraphDef()
with tf.gfile.GFile(ckpt_path, "rb") as fid1:
od_graph_def.ParseFromString(fid1.read())
tf.import_graph_def(od_graph_def, name="")
sess = tf.Session(graph=graph)
with tf.device('/gpu:1'):
tensor = graph.get_tensor_by_name("image_tensor:0")
boxes = graph.get_tensor_by_name("detection_boxes:0")
scores = graph.get_tensor_by_name("detection_scores:0")
classes = graph.get_tensor_by_name('detection_classes:0')
return sess, tensor, boxes, scores, classes
因此,问题是,即使将tf.device设置为GPU 1,我的可见设备设置为'0,1',运行推理时,我还是从nvidia-smi中看到仅使用了GPU 0(GPU 0的GPU-Util很高-几乎是100%-而GPU 1的GPU-Util是0).为什么不使用GPU 1?
我想并行使用两个GPU,但是即使使用以下代码,它仍仅使用GPU 0:
with tf.device('/gpu:0'):
tensor = graph.get_tensor_by_name("image_tensor:0")
boxes = graph.get_tensor_by_name("detection_boxes:0")
with tf.device('/gpu:1'):
scores = graph.get_tensor_by_name("detection_scores:0")
classes = graph.get_tensor_by_name('detection_classes:0')
任何建议都将不胜感激.
谢谢.
卫斯理
您可以使用 GPUtil 包选择未使用的GPU,并过滤CUDA_VISIBLE_DEVICES环境变量.>
这将允许您在所有GPU上运行并行实验.
# Import os to set the environment variable CUDA_VISIBLE_DEVICES
import os
import tensorflow as tf
import GPUtil
# Set CUDA_DEVICE_ORDER so the IDs assigned by CUDA match those from nvidia-smi
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
# Get the first available GPU
DEVICE_ID_LIST = GPUtil.getFirstAvailable()
DEVICE_ID = DEVICE_ID_LIST[0] # grab first element from list
# Set CUDA_VISIBLE_DEVICES to mask out all other GPUs than the first available device id
os.environ["CUDA_VISIBLE_DEVICES"] = str(DEVICE_ID)
# Since all other GPUs are masked out, the first available GPU will now be identified as GPU:0
device = '/gpu:0'
print('Device ID (unmasked): ' + str(DEVICE_ID))
print('Device ID (masked): ' + str(0))
# Run a minimum working example on the selected GPU
# Start a session
with tf.Session() as sess:
# Select the device
with tf.device(device):
# Declare two numbers and add them together in TensorFlow
a = tf.constant(12)
b = tf.constant(30)
result = sess.run(a+b)
print('a+b=' + str(result))
参考: https://github.com/anderskm/gputil
I hit a problem when running TensorFlow inference on multiple-GPU setups.
Environment: Python 3.6.4; TensorFlow 1.8.0; Centos 7.3; 2 Nvidia Tesla P4
Here is the nvidia-smi output when the system is free:
Tue Aug 28 10:47:42 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.81 Driver Version: 384.81 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P4 Off | 00000000:00:0C.0 Off | 0 |
| N/A 38C P0 22W / 75W | 0MiB / 7606MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P4 Off | 00000000:00:0D.0 Off | 0 |
| N/A 39C P0 23W / 75W | 0MiB / 7606MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
The key statements related to my issue:
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"
def get_sess_and_tensor(ckpt_path):
assert os.path.exists(ckpt_path), "file: {} not exist.".format(ckpt_path)
graph = tf.Graph()
with graph.as_default():
od_graph_def = tf.GraphDef()
with tf.gfile.GFile(ckpt_path, "rb") as fid1:
od_graph_def.ParseFromString(fid1.read())
tf.import_graph_def(od_graph_def, name="")
sess = tf.Session(graph=graph)
with tf.device('/gpu:1'):
tensor = graph.get_tensor_by_name("image_tensor:0")
boxes = graph.get_tensor_by_name("detection_boxes:0")
scores = graph.get_tensor_by_name("detection_scores:0")
classes = graph.get_tensor_by_name('detection_classes:0')
return sess, tensor, boxes, scores, classes
So, the problem is, when set I visible devices to '0,1', even if I set tf.device to GPU 1, when running inference, I see from nvidia-smi that only GPU 0 is used (GPU 0's GPU-Util is high – almost 100% – whereas GPU 1's is 0). Why doesn't it use GPU 1?
I want to use the two GPUs in parallel, but even with the following code, it still uses only GPU 0:
with tf.device('/gpu:0'):
tensor = graph.get_tensor_by_name("image_tensor:0")
boxes = graph.get_tensor_by_name("detection_boxes:0")
with tf.device('/gpu:1'):
scores = graph.get_tensor_by_name("detection_scores:0")
classes = graph.get_tensor_by_name('detection_classes:0')
Any suggestions are greatly appreciated.
Thanks.
Wesley
You can use the GPUtil package to select unused gpus and filter the CUDA_VISIBLE_DEVICES environnement variable.
This will allow you to run parallel experiments on all your gpus.
# Import os to set the environment variable CUDA_VISIBLE_DEVICES
import os
import tensorflow as tf
import GPUtil
# Set CUDA_DEVICE_ORDER so the IDs assigned by CUDA match those from nvidia-smi
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
# Get the first available GPU
DEVICE_ID_LIST = GPUtil.getFirstAvailable()
DEVICE_ID = DEVICE_ID_LIST[0] # grab first element from list
# Set CUDA_VISIBLE_DEVICES to mask out all other GPUs than the first available device id
os.environ["CUDA_VISIBLE_DEVICES"] = str(DEVICE_ID)
# Since all other GPUs are masked out, the first available GPU will now be identified as GPU:0
device = '/gpu:0'
print('Device ID (unmasked): ' + str(DEVICE_ID))
print('Device ID (masked): ' + str(0))
# Run a minimum working example on the selected GPU
# Start a session
with tf.Session() as sess:
# Select the device
with tf.device(device):
# Declare two numbers and add them together in TensorFlow
a = tf.constant(12)
b = tf.constant(30)
result = sess.run(a+b)
print('a+b=' + str(result))
Reference: https://github.com/anderskm/gputil
这篇关于为什么TensorFlow始终使用GPU 0?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!