如何在具有tensorflow2和keras的多GPU上训练模型? [英] How to train a model on multi gpus with tensorflow2 and keras?

查看:1469
本文介绍了如何在具有tensorflow2和keras的多GPU上训练模型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个要在多个GPU上训练的LSTM模型.我将代码转换为执行此操作,在nvidia-smi中,我可以看到它正在使用所有GPU的所有内存,并且每个GPU都使用了大约40%的BUT,估计每批训练的时间几乎与1 gpu.

I have an LSTM model that I want to train on multiple gpus. I transformed the code to do this and in nvidia-smi I could see that it is using all the memory of all the gpus and each of the gpus are utilizing around 40% BUT the estimated time for training of each batch was almost the same as 1 gpu.

有人可以引导我并告诉我如何在多个GPU上正确训练吗?

Can someone please guid me and tell me how I can train properly on multiple gpus?

我的代码:

import tensorflow as tf

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import Dropout

import os
from tensorflow.keras.callbacks import ModelCheckpoint



checkpoint_path = "./model/"
checkpoint_dir = os.path.dirname(checkpoint_path)
cp_callback = ModelCheckpoint(filepath=checkpoint_path, save_freq= 'epoch', verbose=1 )

# NNET - LSTM
strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
    regressor = Sequential()

    regressor.add(LSTM(units = 180, return_sequences = True, input_shape = (X_train.shape[1], 3)))
    regressor.add(Dropout(0.2))

    regressor.add(LSTM(units = 180, return_sequences = True))
    regressor.add(Dropout(0.2))

    regressor.add(LSTM(units = 180))
    regressor.add(Dropout(0.2))

    regressor.add(Dense(units = 4))

    regressor.compile(optimizer = 'adam', loss = 'mean_squared_error')

regressor.fit(X_train, y_train, epochs = 10, batch_size = 32, callbacks=[cp_callback])

推荐答案

假定单个GPU的batch_sizeN,每批花费的时间为X秒.

Assuming that your batch_size for a single GPU is N and the time taken per batch is X secs.

您可以通过测量模型收敛所需的时间来衡量训练速度,但是您必须确保使用2个GPU正确地输入batch_size,因为2个GPU将具有两倍的内存,您应该将batch_size线性缩放至2N.可能令人着迷的是,该模型每批次仍需要X秒,但是您应该知道,现在您的模型正在每批次中看到2N个样本,这将导致 Quicker收敛,因为现在,您可以更高的学习速度进行培训.

You can measure the training speed by measuring the time taken for the model to converge, but you have to make sure that you feed in the right batch_size with 2 GPUs since 2 GPUs will have twice the memory of a single GPU you should linearly scale your batch_size to 2N. It might be deceiving to see that the model still takes X secs per batch, but you should know that now your model is seeing 2N samples per batch, and would lead to a quicker convergence because now you can train with a higher learning rate.

如果两个GPU的内存都被占用并且处于40%利用率,则可能有多种原因

If both of your GPUs have their memory utilized and are sitting at 40% utilization there might be multiple reasons

  • 模型太简单了,您不需要所有的计算.
  • 您的batch_size很小,您的GPU可以处理更大的batch_size
  • 您的CPU是瓶颈,因此使GPU等待数据准备就绪,当您看到GPU利用率达到峰值时,情况可能就是这种情况
  • 您需要编写更好的性能数据管道.您可以在此处找到有关有效数据输入管道的更多信息- https://www.tensorflow.org/guide/data_performance
  • The model is too simple and you don't need all that compute.
  • Your batch_size is small and your GPUs can handle a bigger batch_size
  • Your CPU is the bottleneck and thus making the GPUs wait for the data to be ready, this can be the case when you see spikes in GPU utilization
  • You need to write a better and performant data pipeline. You can find more about efficient data input pipelines here - https://www.tensorflow.org/guide/data_performance

这篇关于如何在具有tensorflow2和keras的多GPU上训练模型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆