如何在张量流中获取当前可用的GPU? [英] How to get current available GPUs in tensorflow?
问题描述
我有一个使用分布式TensorFlow的计划,我看到TensorFlow可以使用GPU进行培训和测试.在群集环境中,每台机器可能具有0个或1个或更多个GPU,我想将TensorFlow图运行到尽可能多的机器上的GPU中.
I have a plan to use distributed TensorFlow, and I saw TensorFlow can use GPUs for training and testing. In a cluster environment, each machine could have 0 or 1 or more GPUs, and I want to run my TensorFlow graph into GPUs on as many machines as possible.
我发现运行tf.Session()
时,TensorFlow在如下所示的日志消息中提供了有关GPU的信息:
I found that when running tf.Session()
TensorFlow gives information about GPU in the log messages like below:
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
我的问题是如何从TensorFlow获取有关当前可用GPU的信息?我可以从日志中获取已加载的GPU信息,但我想以一种更复杂的编程方式来实现. 我还可以使用CUDA_VISIBLE_DEVICES环境变量有意地限制GPU,所以我不想知道一种从OS内核获取GPU信息的方法.
My question is how do I get information about current available GPU from TensorFlow? I can get loaded GPU information from the log, but I want to do it in a more sophisticated, programmatic way. I also could restrict GPUs intentionally using the CUDA_VISIBLE_DEVICES environment variable, so I don't want to know a way of getting GPU information from OS kernel.
简而言之,如果机器中有两个GPU,我想要一个类似tf.get_available_gpus()
的函数,该函数将返回['/gpu:0', '/gpu:1']
.我该如何实施?
In short, I want a function like tf.get_available_gpus()
that will return ['/gpu:0', '/gpu:1']
if there are two GPUs available in the machine. How can I implement this?
推荐答案
有一个未公开的方法,称为 DeviceAttributes
协议缓冲区对象.您可以提取GPU设备的字符串设备名称列表,如下所示:
There is an undocumented method called device_lib.list_local_devices()
that enables you to list the devices available in the local process. (N.B. As an undocumented method, this is subject to backwards incompatible changes.) The function returns a list of DeviceAttributes
protocol buffer objects. You can extract a list of string device names for the GPU devices as follows:
from tensorflow.python.client import device_lib
def get_available_gpus():
local_device_protos = device_lib.list_local_devices()
return [x.name for x in local_device_protos if x.device_type == 'GPU']
请注意(至少达到TensorFlow 1.4),调用device_lib.list_local_devices()
将运行一些初始化代码,默认情况下,这些初始化代码将在所有设备上分配所有GPU内存(此问题.
Note that (at least up to TensorFlow 1.4), calling device_lib.list_local_devices()
will run some initialization code that, by default, will allocate all of the GPU memory on all of the devices (GitHub issue). To avoid this, first create a session with an explicitly small per_process_gpu_fraction
, or allow_growth=True
, to prevent all of the memory being allocated. See this question for more details.
这篇关于如何在张量流中获取当前可用的GPU?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!