如何通过张量流以编程方式确定可用的GPU内存? [英] how to programmatically determine available GPU memory with tensorflow?

查看：89 发布时间：2020/11/20 0:53:57 python tensorflow gpu

本文介绍了如何通过张量流以编程方式确定可用的GPU内存?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

对于矢量量化(k-means)程序，我想知道当前GPU(如果有)上的可用内存量.需要选择一个最佳的批次大小，以使尽可能少的批次可以在整个数据集上运行.

For a vector quantization (k-means) program I like to know the amount of available memory on the present GPU (if there is one). This is needed to choose an optimal batch size in order to have as few batches as possible to run over the complete data set.

我编写了以下测试程序:

I have written the following test program:

import tensorflow as tf
import numpy as np
from kmeanstf import KMeansTF
print("GPU Available: ", tf.test.is_gpu_available())

nn=1000
dd=250000
print("{:,d} bytes".format(nn*dd*4))
dic = {}
for x in "ABCD":
    dic[x]=tf.random.normal((nn,dd))
    print(x,dic[x][:1,:2])

print("done...")

这是我系统上的典型输出(ubuntu 18.04 LTS，GTX-1060 6GB).请注意核心转储.

This is a typical output on my system with (ubuntu 18.04 LTS, GTX-1060 6GB). Please note the core dump.

python misc/maxmem.py 
GPU Available:  True
1,000,000,000 bytes
A tf.Tensor([[-0.23787294 -2.0841186 ]], shape=(1, 2), dtype=float32)
B tf.Tensor([[ 0.23762687 -1.1229591 ]], shape=(1, 2), dtype=float32)
C tf.Tensor([[-1.2672468   0.92139906]], shape=(1, 2), dtype=float32)
2020-01-02 17:35:05.988473: W tensorflow/core/common_runtime/bfc_allocator.cc:419] Allocator (GPU_0_bfc) ran out of memory trying to allocate 953.67MiB (rounded to 1000000000).  Current allocation summary follows.
2020-01-02 17:35:05.988752: W tensorflow/core/common_runtime/bfc_allocator.cc:424] **************************************************************************************************xx
2020-01-02 17:35:05.988835: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at cwise_ops_common.cc:82 : Resource exhausted: OOM when allocating tensor with shape[1000,250000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Segmentation fault (core dumped)

有时我会从python而不是核心转储中得到一个错误(请参阅下文).这实际上会更好，因为我可以捕获它，从而通过反复试验确定最大可用内存.但是它与核心转储交替出现:

Occasionally I do get an error from python instead of a core dump (see below). This would actually be better since I could catch it and thus determine by trial and error the maximum available memory. But it alternates with core dumps:

python misc/maxmem.py 
GPU Available:  True
1,000,000,000 bytes
A tf.Tensor([[-0.73510283 -0.94611156]], shape=(1, 2), dtype=float32)
B tf.Tensor([[-0.8458411  0.552555 ]], shape=(1, 2), dtype=float32)
C tf.Tensor([[0.30532074 0.266423  ]], shape=(1, 2), dtype=float32)
2020-01-02 17:35:26.401156: W tensorflow/core/common_runtime/bfc_allocator.cc:419] Allocator (GPU_0_bfc) ran out of memory trying to allocate 953.67MiB (rounded to 1000000000).  Current allocation summary follows.
2020-01-02 17:35:26.401486: W tensorflow/core/common_runtime/bfc_allocator.cc:424] **************************************************************************************************xx
2020-01-02 17:35:26.401571: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at cwise_ops_common.cc:82 : Resource exhausted: OOM when allocating tensor with shape[1000,250000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
  File "misc/maxmem.py", line 11, in <module>
    dic[x]=tf.random.normal((nn,dd))
  File "/home/fritzke/miniconda2/envs/tf20b/lib/python3.7/site-packages/tensorflow_core/python/ops/random_ops.py", line 76, in random_normal
    value = math_ops.add(mul, mean_tensor, name=name)
  File "/home/fritzke/miniconda2/envs/tf20b/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_math_ops.py", line 391, in add
    _six.raise_from(_core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1000,250000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:Add] name: random_normal/

无论软件运行在什么系统上，我如何都能可靠地获取此信息?

How could I reliably get this information for whatever system the software is running on?

如何通过张量流以编程方式确定可用的GPU内存? [英] how to programmatically determine available GPU memory with tensorflow?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何通过张量流以编程方式确定可用的GPU内存? [英] how to programmatically determine available GPU memory with tensorflow?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭