如何让 TensorFlow 100% 使用 GPU? [英] How to make TensorFlow use 100% of GPU?
问题描述
我有一台配备 RTX 2060 GPU 的笔记本电脑,我正在使用 Keras 和 TF 2 在其上训练 LSTM.我还通过 nvidia-smi
监视 gpu 的使用,我注意到 jupyter notebook 和 TF 的使用率最高为 35%,通常 gpu 的使用率在 10-25% 之间.
I have a laptop that has an RTX 2060 GPU and I am using Keras and TF 2 to train an LSTM on it. I am also monitoring the gpu use by nvidia-smi
and I noticed that the jupyter notebook and TF are using maximum 35% and usually the gpu is being used between 10-25%.
目前条件下,训练这个模型需要7个多小时,我想知道是我做错了什么还是Keras和TF的限制?
With current conditions, it took more than 7 hours to train this model, I want to know if I am doing something wrong or it is a limitation of Keras and TF?
我的 nvidia-smi 输出:
My nvidia-smi output:
Sun Nov 3 00:07:37 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.26 Driver Version: 430.26 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 2060 Off | 00000000:01:00.0 On | N/A |
| N/A 51C P3 22W / N/A | 834MiB / 5931MiB | 24% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1032 G /usr/lib/xorg/Xorg 330MiB |
| 0 1251 G /usr/bin/gnome-shell 333MiB |
| 0 1758 G ...equest-channel-token=622209288718607755 121MiB |
| 0 5086 G ...uest-channel-token=12207632792533837012 47MiB |
+-----------------------------------------------------------------------------+
我的 LSTM:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import Dropout
regressor = Sequential()
regressor.add(LSTM(units = 180, return_sequences = True, input_shape = (X_train.shape[1], 3)))
regressor.add(Dropout(0.2))
regressor.add(LSTM(units = 180, return_sequences = True))
regressor.add(Dropout(0.2))
regressor.add(LSTM(units = 180, return_sequences = True))
regressor.add(Dropout(0.2))
regressor.add(LSTM(units = 180, return_sequences = True))
regressor.add(Dropout(0.2))
regressor.add(LSTM(units = 180, return_sequences = True))
regressor.add(Dropout(0.2))
regressor.add(LSTM(units = 180))
regressor.add(Dropout(0.2))
regressor.add(Dense(units = 1))
regressor.compile(optimizer = 'adam', loss = 'mean_squared_error')
regressor.fit(X_train, y_train, epochs = 10, batch_size = 32, callbacks=[cp_callback])
推荐答案
TensorFlow 通过 CUDA & 自动负责优化 GPU 资源分配cuDNN,假设后者已正确安装.您看到的使用统计数据主要是内存/计算资源活动",不一定是实用程序(执行);请参阅此答案.你的效用只有"25% 是一件好事 - 否则,如果你大幅增加你的模型尺寸(它不是很大),你会 OOM.
TensorFlow automatically takes care of optimizing GPU resource allocation via CUDA & cuDNN, assuming latter's properly installed. The usage statistics you're seeing are mainly that of memory/compute resource 'activity', not necessarily utility (execution); see this answer. That your utility is "only" 25% is a good thing - otherwise, if you substantially increased your model size (which isn't large as-is), you'd OOM.
为了增加使用率、增加批量大小、模型大小或任何会增加计算的并行性;请注意,使模型更深会增加 GPU 的内存效用,但其计算效用却少得多.
To increase usage, increase batch size, model size, or whatever would increase the parallelism of computations; note that making the model deeper would increase GPU's memory utility, but far less so its compute-utility.
此外,请考虑使用 CuDNNLSTM
代替 LSTM
,后者可以快 10 倍 并使用 更少 GPU 内存(算法工艺提供),但更多的计算效用.最后,插入 Conv1D
作为第一层,strides >1
将通过减少输入大小来显着提高训练速度,而不必损害性能(实际上可以提高性能).
Also, consider using CuDNNLSTM
instead of LSTM
, which can run 10x faster and use less GPU memory (courtesy of algorithmic artisanship), but more compute-utility. Lastly, inserting Conv1D
as the first layer with strides > 1
will significantly increase train speed by reducing input size, without necessarily harming performance (it can in fact improve it).
更新:超频 GPU 是一种选择,但我建议不要这样做,因为从长远来看它会磨损 GPU(并且所有 DL 都是长期").还有过电压"和其他硬件调整,但所有这些都应该用于一些简短的应用程序.最大的不同在于您的输入数据管道.
Update: overclocking the GPU is an option, but I'd advise against it as it can wear out the GPU in the long run (and all DL is "long run"). There's also "over-volting" and other hardware tweaks, but all should be used for some short applications. What'll make the greatest difference is your input data pipeline.
这篇关于如何让 TensorFlow 100% 使用 GPU?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!