Keras:执行超参数网格搜索时内存不足 [英] Keras: Out of memory when doing hyper parameter grid search

查看:320
本文介绍了Keras:执行超参数网格搜索时内存不足的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行多个嵌套循环来进行超参数网格搜索.每个嵌套循环都通过超参数值列表运行,并且在最内部的循环内部,每次使用生成器构建和评估Keras顺序模型. (我没有进行任何培训,我只是随机初始化,然后多次评估模型,然后获取平均损失).

I'm running multiple nested loops to do hyper parameter grid search. Each nested loop runs through a list of hyper parameter values and inside the innermost loop, a Keras sequential model is built and evaluated each time using a generator. (I'm not doing any training, I'm just randomly initializing and then evaluating the model multiple times and then retrieving the average loss).

我的问题是,在此过程中,Keras似乎正在填满我的GPU内存,因此我最终遇到了OOM错误.

My problem is that during this process, Keras seems to be filling up my GPU memory, so that I eventually get an OOM error.

在评估模型后,有人知道如何解决这个问题并释放GPU内存吗?

Does anybody know how to solve this and free up the GPU memory each time after a model is evaluated?

在对模型进行评估后,我不再需要该模型,每次在内部循环的下一遍构建新模型之前,我都可以将其完全丢弃.

I do not need the model anymore at all after it has been evaluated, I can throw it away entirely every time before building a new one in the next pass of the inner loop.

我正在使用Tensorflow后端.

I'm using the Tensorflow backend.

这是代码,尽管其中大部分与一般问题无关.该模型是在第四个循环中建立的,

Here is the code, although much of it isn't relevant to the general problem. The model is built inside the fourth loop,

for fsize in fsizes:

我想模型构建的细节并没有多大关系,但无论如何,这就是全部:

I guess the details of how the model is built don't matter much, but here is all of it anyway:

model_losses = []
model_names = []

for activation in activations:
    for i in range(len(layer_structures)):
        for width in layer_widths[i]:
            for fsize in fsizes:

                model_name = "test_{}_struc-{}_width-{}_fsize-{}".format(activation,i,np.array_str(np.array(width)),fsize)
                model_names.append(model_name)
                print("Testing new model: ", model_name)

                #Structure for this network
                structure = layer_structures[i]

                row, col, ch = 80, 160, 3  # Input image format

                model = Sequential()

                model.add(Lambda(lambda x: x/127.5 - 1.,
                          input_shape=(row, col, ch),
                          output_shape=(row, col, ch)))

                for j in range(len(structure)):
                    if structure[j] == 'conv':
                        model.add(Convolution2D(width[j], fsize, fsize))
                        model.add(BatchNormalization(axis=3, momentum=0.99))
                        if activation == 'relu':
                            model.add(Activation('relu'))
                        if activation == 'elu':
                            model.add(ELU())
                            model.add(MaxPooling2D())
                    elif structure[j] == 'dense':
                        if structure[j-1] == 'dense':
                            model.add(Dense(width[j]))
                            model.add(BatchNormalization(axis=1, momentum=0.99))
                            if activation == 'relu':
                                model.add(Activation('relu'))
                            elif activation == 'elu':
                                model.add(ELU())
                        else:
                            model.add(Flatten())
                            model.add(Dense(width[j]))
                            model.add(BatchNormalization(axis=1, momentum=0.99))
                            if activation == 'relu':
                                model.add(Activation('relu'))
                            elif activation == 'elu':
                                model.add(ELU())

                model.add(Dense(1))

                average_loss = 0
                for k in range(5):
                    model.compile(optimizer="adam", loss="mse")
                    val_generator = generate_batch(X_val, y_val, resize=(160,80))
                    loss = model.evaluate_generator(val_generator, len(y_val))
                    average_loss += loss

                average_loss /= 5

                model_losses.append(average_loss)

                print("Average loss after 5 initializations: {:.3f}".format(average_loss))
                print()

推荐答案

如所示,正在使用的后端是Tensorflow.使用Tensorflow后端时,当前模型不会被破坏,因此您需要清除会话.

As indicated, the backend being used is Tensorflow. With the Tensorflow backend the current model is not destroyed, so you need to clear the session.

使用完模型后,只需输入:

After the usage of the model just put:

if K.backend() == 'tensorflow':
    K.clear_session()

包括后端:

from keras import backend as K

您还可以使用sklearn包装器进行网格搜索.检查以下示例:此处.同样,对于更高级的超参数搜索,您可以使用 hyperas .

Also you can use sklearn wrapper to do grid search. Check this example: here. Also for more advanced hyperparameter search you can use hyperas.

这篇关于Keras:执行超参数网格搜索时内存不足的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆