如何在 tensorflow 中获取当前可用的 GPU? [英] How to get current available GPUs in tensorflow?

查看:52
本文介绍了如何在 tensorflow 中获取当前可用的 GPU?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有使用分布式 TensorFlow 的计划,并且我看到 TensorFlow 可以使用 GPU 进行训练和测试.在集群环境中,每台机器可能有 0 个或 1 个或更多 GPU,我想在尽可能多的机器上将我的 TensorFlow 图运行到 GPU 中.

I have a plan to use distributed TensorFlow, and I saw TensorFlow can use GPUs for training and testing. In a cluster environment, each machine could have 0 or 1 or more GPUs, and I want to run my TensorFlow graph into GPUs on as many machines as possible.

我发现在运行 tf.Session() 时,TensorFlow 会在日志消息中提供有关 GPU 的信息,如下所示:

I found that when running tf.Session() TensorFlow gives information about GPU in the log messages like below:

I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)

我的问题是如何从 TensorFlow 获取有关当前可用 GPU 的信息?我可以从日志中获取加载的 GPU 信息,但我想以更复杂的编程方式来完成.我也可以使用 CUDA_VISIBLE_DEVICES 环境变量故意限制 GPU,所以我不想知道从操作系统内核获取 GPU 信息的方法.

My question is how do I get information about current available GPU from TensorFlow? I can get loaded GPU information from the log, but I want to do it in a more sophisticated, programmatic way. I also could restrict GPUs intentionally using the CUDA_VISIBLE_DEVICES environment variable, so I don't want to know a way of getting GPU information from OS kernel.

简而言之,我想要一个像 tf.get_available_gpus() 这样的函数,它会返回 ['/gpu:0', '/gpu:1'] 如果有是机器中可用的两个 GPU.我该如何实施?

In short, I want a function like tf.get_available_gpus() that will return ['/gpu:0', '/gpu:1'] if there are two GPUs available in the machine. How can I implement this?

推荐答案

有一个未公开的方法叫做 device_lib.list_local_devices() 使您能够列出本地进程中可用的设备.(注意作为一种未记录的方法,这会受到向后不兼容的更改.)该函数返回一个列表 DeviceAttributes 协议缓冲区 对象.您可以为 GPU 设备提取字符串设备名称列表,如下所示:

There is an undocumented method called device_lib.list_local_devices() that enables you to list the devices available in the local process. (N.B. As an undocumented method, this is subject to backwards incompatible changes.) The function returns a list of DeviceAttributes protocol buffer objects. You can extract a list of string device names for the GPU devices as follows:

from tensorflow.python.client import device_lib

def get_available_gpus():
    local_device_protos = device_lib.list_local_devices()
    return [x.name for x in local_device_protos if x.device_type == 'GPU']

请注意(至少到 TensorFlow 1.4),调用 device_lib.list_local_devices() 将运行一些初始化代码,默认情况下,将在所有设备上分配所有 GPU 内存(GitHub 问题).为避免这种情况,首先创建一个具有显式较小 per_process_gpu_fractionallow_growth=True 的会话,以防止分配所有内存.有关详细信息,请参阅此问题.

Note that (at least up to TensorFlow 1.4), calling device_lib.list_local_devices() will run some initialization code that, by default, will allocate all of the GPU memory on all of the devices (GitHub issue). To avoid this, first create a session with an explicitly small per_process_gpu_fraction, or allow_growth=True, to prevent all of the memory being allocated. See this question for more details.

这篇关于如何在 tensorflow 中获取当前可用的 GPU?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆