Tensorflow-GPU专用vs共享内存 [英] Tensorflow - GPU dedicated vs shared memory

查看:376
本文介绍了Tensorflow-GPU专用vs共享内存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Tensorflow是仅使用专用GPU内存还是可以使用共享内存?

我也跑了这个

从tensorflow.python.client导入device_lib

device_lib.list_local_devices()

[名称:"/device:CPU:0"device_type:"CPU"memory_limit:268435456

名称:"/device:GPU:0"device_type:"GPU"memory_limit:112128819

这些内存限制"是268,435,456和112,128,819是什么?

这就是我要说的-当我在Win10上运行TF时,共享内存始终为零,但是如果我的批处理量太大,则会收到ResourceExhaustedError.似乎共享内存从未使用过.

解决方案

以我的经验,Tensorflow仅使用如下所述的专用GPU内存.那时,memory_limit =最大专用内存-当前专用内存使用情况(在Win10任务管理器中已观察到)

来自tensorflow.python.client的

 导入device_lib打印(device_lib.list_local_devices()) 

输出:

  physical_device_desc:设备:XLA_CPU设备";,名称:"/device:GPU:0&";device_type:"GPU";memory_limit:2196032718 

为了验证这一点,我尝试将其用于单个任务(来自

Does Tensorflow use only dedicated GPU memory or can it also use shared memory?

Also I ran this:

from tensorflow.python.client import device_lib

device_lib.list_local_devices()

[name: "/device:CPU:0" device_type: "CPU" memory_limit: 268435456

name: "/device:GPU:0" device_type: "GPU" memory_limit: 112128819

What are these "memory limits" 268,435,456 and 112,128,819 ?

Here is what I am talking about - when I ran TF on Win10, shared memory is always zero and yet I get ResourceExhaustedError if my batch size is too large. It just seems the shared memory is never used.

解决方案

In my experience, Tensorflow only uses the dedicated GPU memory as described below. At that time, memory_limit = max dedicated memory - current dedicated memory usage (observed in the Win10 task manager)

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

Output:

physical_device_desc: "device: XLA_CPU device"
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 2196032718

To verify this , I tried to use for a single task (Tensorflow 2 benchmark from https://github.com/aime-team/tf2-benchmarks), it gives "Resource exhausted" error as below on a GTX1060 3GB with Tensorflow 2.3.0.

2021-01-20 01:50:53.738987: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 1060 3GB computeCapability: 6.1
coreClock: 1.7085GHz coreCount: 9 deviceMemorySize: 3.00GiB deviceMemoryBandwidth: 178.99GiB/s

Limit:                      2196032718
InUse:                      1997814016
MaxInUse:                   2155556352
NumAllocs:                        1943
MaxAllocSize:                551863552
Reserved:                            0
PeakReserved:                        0
LargestFreeBlock:                    0

2021-01-20 01:51:21.393175: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at conv_ops.cc:539 : Resource exhausted: OOM when allocating tensor with shape[64,256,56,56] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):

I have tried to do the same with multiple small tasks. It tries to use the shared GPU memory for multiple tasks with different Juypter kernels, but the newer task ultimately fails.

For an example with two similar Xception models :

Task 1: runs without an error

Task 2: fails with below error

UnknownError:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[node xception/block1_conv1/Conv2D (defined at <ipython-input-25-0c5fe80db9f1>:3) ]] [Op:__inference_predict_function_5303]

Function call stack:
predict_function

GPU Memory usage at the failure (note the usage of shared memory at the start of the Task 2)


GPU Memory usage at the failure

这篇关于Tensorflow-GPU专用vs共享内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆