为什么在训练RNN模型时,GeForce GTX 1080 Ti比Quadro K1200慢? [英] Why is GeForce GTX 1080 Ti slower than Quadro K1200 on training a RNN model?
问题描述
问题类型:回归
输入:序列长度从14到39不等,每个序列点都是4元素向量.
Inputs: sequence length varies from 14 to 39, each sequence point is a 4-element vector.
输出:标量
神经网络:3层Bi-LSTM(隐藏矢量大小:200),然后是2个全连接层
Neural Network: 3-layer Bi-LSTM (hidden vector size: 200) followed by 2 Fully Connected layers
批量大小:30
每个时期的样本数:〜7,000
Number of samples per epoch: ~7,000
TensorFlow版本:tf-nightly-gpu 1.6.0-dev20180112
TensorFlow version: tf-nightly-gpu 1.6.0-dev20180112
CUDA版本:9.0
CuDNN版本:7
两个GPU 的详细信息:
GPU 0: 名称:GeForce GTX 1080 Ti主要:6个次要:1 memoryClockRate(GHz):1.582 totalMemory:11.00GiB可用内存:10.72GiB
GPU 0: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582 totalMemory: 11.00GiB freeMemory: 10.72GiB
运行过程中的nvidia-smi(仅使用1080 Ti):
nvidia-smi during the run (using 1080 Ti only):
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 385.69 Driver Version: 385.69 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... WDDM | 00000000:02:00.0 Off | N/A |
| 20% 37C P2 58W / 250W | 10750MiB / 11264MiB | 10% Default |
+-------------------------------+----------------------+----------------------+
| 1 Quadro K1200 WDDM | 00000000:03:00.0 On | N/A |
| 39% 35C P8 1W / 31W | 751MiB / 4096MiB | 2% Default |
+-------------------------------+----------------------+----------------------+
GPU 1: 名称:Quadro K1200主要:5个次要:0 memoryClockRate(GHz):1.0325 totalMemory:4.00GiB空闲内存:3.44GiB
GPU 1: name: Quadro K1200 major: 5 minor: 0 memoryClockRate(GHz): 1.0325 totalMemory: 4.00GiB freeMemory: 3.44GiB
运行过程中的nvidia-smi(仅使用K1200):
nvidia-smi during the run (using K1200 only):
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 385.69 Driver Version: 385.69 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... WDDM | 00000000:02:00.0 Off | N/A |
| 20% 29C P8 8W / 250W | 136MiB / 11264MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Quadro K1200 WDDM | 00000000:03:00.0 On | N/A |
| 39% 42C P0 6W / 31W | 3689MiB / 4096MiB | 23% Default |
+-------------------------------+----------------------+----------------------+
花费1个时间段:
仅GPU 0(将环境变量设置为"CUDA_VISIBLE_DEVICES" = 0):〜60分钟
GPU 0 only (set environment var "CUDA_VISIBLE_DEVICES"=0): ~60 minutes
仅GPU 1(将环境变量设置为"CUDA_VISIBLE_DEVICES" = 1):〜45分钟
GPU 1 only (set environment var "CUDA_VISIBLE_DEVICES"=1): ~45 minutes
设置环境.变种在两个测试中都设为" TF_MIN_GPU_MULTIPROCESSOR_COUNT = 4".
Set env. var. to "TF_MIN_GPU_MULTIPROCESSOR_COUNT=4" during both tests.
为什么更好的GPU(GeForce GTX 1080 Ti)在训练我的神经网络时会变慢?
谢谢.
更新
Another set of tests on MNIST dataset using a CNN model showed the same pattern:
训练17个纪元所花费的时间:
Time spent for training 17 epochs:
GPU 0(1080 Ti):〜59分钟
GPU 0 (1080 Ti): ~59 minutes
GPU 1(K1200):约45分钟
GPU 1 (K1200): ~45 minutes
推荐答案
官方 tensorflow文档的允许GPU内存增长"部分介绍了两个用于控制GPU内存分配的会话选项.我单独尝试了它们来训练我的RNN模型(仅使用GeForce GTX 1080 Ti):
The official tensorflow document has the section "Allowing GPU memory growth" introducing two session options to control GPU memory allocation. I tried them separately to train my RNN model (using only GeForce GTX 1080 Ti):
-
config.gpu_options.allow_growth = True
和 -
config.gpu_options.per_process_gpu_memory_fraction = 0.05
config.gpu_options.allow_growth = True
andconfig.gpu_options.per_process_gpu_memory_fraction = 0.05
他们俩都将训练时间从原来的每个时期60分钟缩短到了每个时期42分钟.我仍然不明白为什么会有帮助.如果您能解释的话,我将作为答案.谢谢.
Both of them shortened the training time from the original ~60 minutes per epoch to ~42 minutes per epoch. I still don't understand why this helps. If you can explain it, I will accept that as the answer. Thanks.
这篇关于为什么在训练RNN模型时,GeForce GTX 1080 Ti比Quadro K1200慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!