每次重复后,keras模型的训练变慢 [英] Training of keras model get's slower after each repetition

查看:246
本文介绍了每次重复后,keras模型的训练变慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一些代码来优化神经网络体系结构,因此有一个python函数create_nn(parms)可以创建和初始化keras模型. 但是,我遇到的问题是,经过较少的迭代后,模型的训练时间比平时要长得多(最初一个历时需要10秒,然后大约是第14个模型(每个模型训练20个历时)之后,它需要60秒/时代). 我知道这不是因为体系结构的演进,因为如果我重新启动脚本并开始将其终止,则它将恢复正常速度.

I'm writing some code to optimize a neural net architecture and so have a python function create_nn(parms) that creates and initializes a keras model. However, the problem I'm having is that after a fewer iterations the models take a lot longer to train than usual (initally one epoch takes 10sec, and then after roughly the 14th model (each model trains for 20 epochs) it takes 60sec/epoch). I know that this is not because of the evolving architecture because if I restart the script and start were it ended, it is back to normal speeds.

我当前正在运行

from keras import backend as K

然后是

K.clear_session()

在训练了任何给定的新模型之后.

after training any given new model.

一些其他详细信息:

  • 对于前12个模型,每个时期的训练时间大致保持恒定,为10秒/时期.然后在第13个模型中,每个时期的训练时间稳定地上升到60秒.然后每个纪元的训练时间徘徊在每个纪元60秒左右.

  • For the first 12 models, training time per epoch remains roughly constant at 10sec/epoch. Then at the 13th model training time per epoch climbs steadily to 60sec. Then training time per epoch hovers at around 60sec/epoch.

我正在使用Tensorflow作为后端运行keras

I'm running keras with Tensorflow as the backend

我正在使用Amazon EC2 t2.xlarge实例

I'm using an Amazon EC2 t2.xlarge instance

有足够的可用RAM(7GB可用空间为5GB)

There is plenty of free RAM (7GB free, w/ the dataset of size 5GB)

我删除了一堆图层和参数,但实际上create_nn看起来像这样:

I've removed a bunch of layers and parameters, but essentially create_nn looks like:

def create_nn(features, timesteps, number_of_filters):
    inputs = Input(shape = (timesteps, features))
    x = GaussianNoise(stddev=0.005)(inputs)
    #Layer 1.1
    x = Convolution1D(number_of_filters, 3, padding='valid')(x)
    x = Activation('relu')(x)
    x = Flatten()(x)
    x = Dense(10)(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = Dropout(0.5)(x)
    # Output layer
    outputs = Dense(1, activation='sigmoid')(x)
    model = Model(inputs=inputs, outputs=outputs)

    # Compile and Return
    model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
    print('CNN model built succesfully.')
    return model

请注意,虽然在这个虚拟示例中可以使用Sequential模型,但实际用例需要功能性API.

Note that while a Sequential model would've worked in this dummy example, the functional API is required for the actual usecase.

如何解决此问题?

推荐答案

为什么每次跑步后训练时间都会增加?

简短的回答:您需要在创建的每个新模型之前使用tf.keras.backend.clear_session().

仅当急切的执行关闭时,才会出现此问题.

好的,让我们在有和没有clear_session的情况下进行一次实验. make_model的代码在此响应的结尾.

Okay, so let's run an experiment with and without clear_session. The code for make_model is at the end of this response.

首先,让我们看一下使用清晰会话时的培训时间.我们将运行此实验10次,打印结果

First, let's look at the training time when using clear session. We'll run this experiment 10 times an print the results

non_seq_time = [ make_model(clear_session=True) for _ in range(10)]

with clear_session = True

non sequential
Elapse =  1.06039
Elapse =  1.20795
Elapse =  1.04357
Elapse =  1.03374
Elapse =  1.02445
Elapse =  1.00673
Elapse =  1.01712
Elapse =    1.021
Elapse =  1.17026
Elapse =  1.04961

如您所见,训练时间保持恒定

As you can see, the training time stays about constant

现在让我们在不使用清晰会话的情况下重新运行实验并查看培训时间

Now let's re-run the experiment without using clear session and review the training time

non_seq_time = [ make_model(clear_session=False) for _ in range(10)]

with clear_session = False

non sequential
Elapse =  1.10954
Elapse =  1.13042
Elapse =  1.12863
Elapse =   1.1772
Elapse =   1.2013
Elapse =  1.31054
Elapse =  1.27734
Elapse =  1.32465
Elapse =  1.32387
Elapse =  1.33252

如您所见,没有clear_session,训练时间就会增加

as you can see, the training time increases without clear_session

# Training time increases - and how to fix it

# Setup and imports

# %tensorflow_version 2.x

import tensorflow as tf
import tensorflow.keras.layers as layers
import tensorflow.keras.models as models
from time import time

# if you comment this out, the problem doesn't happen
# it only happens when eager execution is disabled !!
tf.compat.v1.disable_eager_execution()


(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()


# Let's build that network
def make_model(activation="relu", hidden=2, units=100, clear_session=False):
    # -----------------------------------
    # .     HERE WE CAN TOGGLE CLEAR SESSION
    # -----------------------------------
    if clear_session:
        tf.keras.backend.clear_session()

    start = time()
    inputs = layers.Input(shape=[784])
    x = inputs

    for num in range(hidden) :
        x = layers.Dense(units=units, activation=activation)(x)

    outputs = layers.Dense(units=10, activation="softmax")(x)
    model = tf.keras.Model(inputs=inputs, outputs=outputs)
    model.compile(optimizer='sgd', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

    results = model.fit(x_train, y_train, validation_data=(x_test, y_test), batch_size=200, verbose=0)
    elapse = time()-start
    print(f"Elapse = {elapse:8.6}")
    return elapse

# Let's try it out and time it

# prime it first
make_model()

print("Use clear session")
non_seq_time = [ make_model(clear_session=True) for _ in range(10)]

print("Don't use clear session")
non_seq_time = [ make_model(clear_session=False) for _ in range(10)]

这篇关于每次重复后,keras模型的训练变慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆