如何使TensorFlow使用100%的GPU? [英] How to make TensorFlow use 100% of GPU?

查看:144
本文介绍了如何使TensorFlow使用100%的GPU?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一台具有RTX 2060 GPU的笔记本电脑,我正在使用Keras和TF 2在其上训练LSTM.我也在监视nvidia-smi的gpu使用情况,我注意到jupyter笔记本和TF最多使用35%,通常gpu的使用率为10-25%.

在当前条件下,训练此模型花费了超过7个小时的时间,我想知道我做错了什么还是Keras和TF的局限性?

我的nvidia-smi输出:

Sun Nov  3 00:07:37 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.26       Driver Version: 430.26       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2060    Off  | 00000000:01:00.0  On |                  N/A |
| N/A   51C    P3    22W /  N/A |    834MiB /  5931MiB |     24%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1032      G   /usr/lib/xorg/Xorg                           330MiB |
|    0      1251      G   /usr/bin/gnome-shell                         333MiB |
|    0      1758      G   ...equest-channel-token=622209288718607755   121MiB |
|    0      5086      G   ...uest-channel-token=12207632792533837012    47MiB |
+-----------------------------------------------------------------------------+

我的LSTM:

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import Dropout

regressor = Sequential()

regressor.add(LSTM(units = 180, return_sequences = True, input_shape = (X_train.shape[1], 3)))
regressor.add(Dropout(0.2))

regressor.add(LSTM(units = 180, return_sequences = True))
regressor.add(Dropout(0.2))

regressor.add(LSTM(units = 180, return_sequences = True))
regressor.add(Dropout(0.2))

regressor.add(LSTM(units = 180, return_sequences = True))
regressor.add(Dropout(0.2))

regressor.add(LSTM(units = 180, return_sequences = True))
regressor.add(Dropout(0.2))

regressor.add(LSTM(units = 180))
regressor.add(Dropout(0.2))

regressor.add(Dense(units = 1))

regressor.compile(optimizer = 'adam', loss = 'mean_squared_error')

regressor.fit(X_train, y_train, epochs = 10, batch_size = 32, callbacks=[cp_callback])

解决方案

TensorFlow会自动处理通过CUDA& cuDNN,假设后者已正确安装.您看到的使用情况统计信息主要是内存/计算资源活动"的统计信息,不一定是实用程序(执行)的统计信息;请参阅此答案.实用程序仅"占25%是一件好事-否则,如果您实质上增加了模型大小(不算大的话),那么您会OOM.

要增加使用量,请增加批处理大小,模型大小,或增加计算的 parallelism 的任何方法;请注意,使模型更深将增加GPU的内存实用程序,但远不及GPU的计算实用程序.

此外,考虑使用CuDNNLSTM代替LSTM,后者可以运行 10倍,并使用 less GPU内存(由算法精湛提供),但计算量更大-公用事业.最后,在Conv1D的第一层插入Conv1D将通过减小输入大小来显着提高火车速度,而不必损害性能(实际上可以改善它).


更新:可以选择对GPU超频,但我建议不要这样做,因为从长远来看它会耗尽GPU(并且所有DL都是长期运行").还存在过电压"和其他硬件调整,但所有这些都应用于某些简短的应用程序.最大的不同是您的输入数据管道.

I have a laptop that has an RTX 2060 GPU and I am using Keras and TF 2 to train an LSTM on it. I am also monitoring the gpu use by nvidia-smi and I noticed that the jupyter notebook and TF are using maximum 35% and usually the gpu is being used between 10-25%.

With current conditions, it took more than 7 hours to train this model, I want to know if I am doing something wrong or it is a limitation of Keras and TF?

My nvidia-smi output:

Sun Nov  3 00:07:37 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.26       Driver Version: 430.26       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2060    Off  | 00000000:01:00.0  On |                  N/A |
| N/A   51C    P3    22W /  N/A |    834MiB /  5931MiB |     24%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1032      G   /usr/lib/xorg/Xorg                           330MiB |
|    0      1251      G   /usr/bin/gnome-shell                         333MiB |
|    0      1758      G   ...equest-channel-token=622209288718607755   121MiB |
|    0      5086      G   ...uest-channel-token=12207632792533837012    47MiB |
+-----------------------------------------------------------------------------+

My LSTM:

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import Dropout

regressor = Sequential()

regressor.add(LSTM(units = 180, return_sequences = True, input_shape = (X_train.shape[1], 3)))
regressor.add(Dropout(0.2))

regressor.add(LSTM(units = 180, return_sequences = True))
regressor.add(Dropout(0.2))

regressor.add(LSTM(units = 180, return_sequences = True))
regressor.add(Dropout(0.2))

regressor.add(LSTM(units = 180, return_sequences = True))
regressor.add(Dropout(0.2))

regressor.add(LSTM(units = 180, return_sequences = True))
regressor.add(Dropout(0.2))

regressor.add(LSTM(units = 180))
regressor.add(Dropout(0.2))

regressor.add(Dense(units = 1))

regressor.compile(optimizer = 'adam', loss = 'mean_squared_error')

regressor.fit(X_train, y_train, epochs = 10, batch_size = 32, callbacks=[cp_callback])

解决方案

TensorFlow automatically takes care of optimizing GPU resource allocation via CUDA & cuDNN, assuming latter's properly installed. The usage statistics you're seeing are mainly that of memory/compute resource 'activity', not necessarily utility (execution); see this answer. That your utility is "only" 25% is a good thing - otherwise, if you substantially increased your model size (which isn't large as-is), you'd OOM.

To increase usage, increase batch size, model size, or whatever would increase the parallelism of computations; note that making the model deeper would increase GPU's memory utility, but far less so its compute-utility.

Also, consider using CuDNNLSTM instead of LSTM, which can run 10x faster and use less GPU memory (courtesy of algorithmic artisanship), but more compute-utility. Lastly, inserting Conv1D as the first layer with strides > 1 will significantly increase train speed by reducing input size, without necessarily harming performance (it can in fact improve it).


Update: overclocking the GPU is an option, but I'd advise against it as it can wear out the GPU in the long run (and all DL is "long run"). There's also "over-volting" and other hardware tweaks, but all should be used for some short applications. What'll make the greatest difference is your input data pipeline.

这篇关于如何使TensorFlow使用100%的GPU?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆