GPU OOM:具有不同模型的超参数调整循环 [英] GPU OOM: Hyperparameter Tuning loop with varying models

查看:62
本文介绍了GPU OOM:具有不同模型的超参数调整循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用itertools.product()进行网格搜索超参数,并在每个循环中覆盖模型变量.但是,在第二次迭代中,它由于内存不足而崩溃:

I'm grid-searching hyperparameters using itertools.product() and overwriting the model variable with each loop. However, at 2nd iteration, it crashes due to Out Of Memory:

import itertools
import tensorflow as tf
from tensorflow import keras
from keras.losses import sparse_categorical_crossentropy
from keras.preprocessing.image import ImageDataGenerator
from keras.optimizers import Adam

hyperparameters = {
'lr': [1e-3, 1e-4],
'model': [model1, model2]
}

hps, values = zip(*hyperparameters.items())
for v in itertools.product(*values):
  cur_hps = dict(zip(hps, v))
  model = cur_hps['model'](input_shape = (256, 256, 3))

  optim = Adam(lr = cur_hps['lr'])
  model.compile(optimizer = optim,
                loss = categorical_categorical_crossentropy,
                metrics = ['accuracy'])

  train_gen = myDataGenerator() # returns Sequence

  model.fit_generator(train_gen,
                      epochs = 5,
                      use_multiprocessing = True,
                      workers = 8)

我尝试用以下方式结束循环:

I've tried ending the loop with:

tf.reset_default_graph()
del model
keras.backend.clear_session()

但无济于事,当要测试50个以上的组合时,这样做很麻烦.这些模型具有不同的架构.

But to no avail, which makes it cumbersome when more than 50 combinations are to be tested. The models has different architectures.

推荐答案

似乎有两种可能的原因:

It seems like there are 2 possible causes:

  1. 训练以前的网络后内存未释放
  2. 给定的模型太大了

对于第一种情况,请检查 Keras:完成训练过程后释放内存

For the first case, check Keras: release memory after finish training process

对于第二种情况,请尝试减小数据生成器中的batch_size,然后查看它是否可以解决问题.或者,使用多个GPU或更改体系结构,使其适合内存.

For the second case, try decreasing the batch_size in your data generator and see whether it fixes the problem. Alternatively, use multiple GPUs or change the architecture so that it can fit into memory.

这篇关于GPU OOM:具有不同模型的超参数调整循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆