运行训练2模型后,keras退出代码-1073741819(0xC0000005) [英] keras exit code -1073741819 (0xC0000005) after running training 2 models

查看:84
本文介绍了运行训练2模型后,keras退出代码-1073741819(0xC0000005)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用Pycharm运行脚本. 我有一个循环的脚本.每个循环: 1.选择一个数据集. 2.训练新的Keras模型. 3.评估该模型.

I use Pycharm to run my script. I have a script that loops. Each loop: 1. Select a dataset. 2. Trains a new Keras model. 3. Evaluate that model.

因此,代码可以在2周内正常运行,但是在安装新的anaconda环境时,该代码在两次循环之后突然失败.

So the code works perfectly for 2 weeks but when installing a new anaconda environment, the code suddenly fails after two iteration of that loop.

Siamese神经网络的两种模型将完美地训练,并且在第三个循环之前,它会崩溃,并且进程的退出代码为-1073741819(0xC0000005).

Two models of Siamese Neural Network will train perfectly fine and right before the third loop, it crashes with Process finished with exit code -1073741819 (0xC0000005).

 1/32 [..............................] - ETA: 0s - loss: 0.5075
12/32 [==========>...................] - ETA: 0s - loss: 0.5112
27/32 [========================>.....] - ETA: 0s - loss: 0.4700
32/32 [==============================] - 0s 4ms/step - loss: 0.4805
eval run time : 0.046851396560668945

For LOOCV run 2 out of 32. Model is SNN. Time taken for instance = 6.077638149261475
Post-training results: 
acc = 1.0 , ce = 0.6019332906978302 , f1 score = 1.0 , mcc = 0.0
cm = 
[[1]]
####################################################################################################

Process finished with exit code -1073741819 (0xC0000005)

奇怪的是,该代码过去运行得非常好,即使我不使用anaconda环境并且使用的是我以前使用的环境,它仍然会以相同的退出代码退出.

The strange thing is that the code used to work perfectly fine and even when I am not using the anaconda enviornment and used the previous environment I used, it still exits with the same exit code.

当我使用另一种类型的模型(密集神经网络)时,它也会崩溃,但经过4次迭代.这与内存不足有关吗?这是循环的示例.确切的模型无关紧要,它总是会在火车模型线(点2和3之间)经过一定次数的循环后崩溃.

When I use a another type of model (a dense neural network), it also crashes but after 4 iteration. Is it something to do with running out of memory? This is an example of the loop. The exact model does not matter, it always crashes after a certain number of loops at the train model line (Between point 2 and 3)

 # Run k model instance to perform skf
    predicted_labels_store = []
    acc_store = []
    ce_store = []
    f1s_store = []
    mcc_store = []
    folds = []
    val_features_c = []
    val_labels = []
    for fold, fl_tuple in enumerate(fl_store):
        instance_start = time.time()
        (ss_fl, i_ss_fl) = fl_tuple  # ss_fl is training fl, i_ss_fl is validation fl
        if model_mode == 'SNN':
            # Run SNN
            model = SNN(hparams, ss_fl.features_c_dim)
            loader = Siamese_loader(model.siamese_net, ss_fl, hparams)
            loader.train(loader.hparams.get('epochs', 100), loader.hparams.get('batch_size', 32),
                         verbose=loader.hparams.get('verbose', 1))
            predicted_labels, acc, ce, cm, f1s, mcc = loader.eval(i_ss_fl)
            predicted_labels_store.extend(predicted_labels)
            acc_store.append(acc)
            ce_store.append(ce)
            f1s_store.append(f1s)
            mcc_store.append(mcc)
        elif model_mode == 'cDNN':
            # Run DNN
            print('Point 1')
            model = DNN_classifer(hparams, ss_fl)
            print('Point 2')
            model.train_model(ss_fl)
            print('Point 3')
            predicted_labels, acc, ce, cm, f1s, mcc = model.eval(i_ss_fl)
            predicted_labels_store.extend(predicted_labels)
            acc_store.append(acc)
            ce_store.append(ce)
            f1s_store.append(f1s)
            mcc_store.append(mcc)
        del model
        K.clear_session()
        instance_end = time.time()
        if cv_mode == 'skf':
            print('\nFor k-fold run {} out of {}. Model is {}. Time taken for instance = {}\n'
                  'Post-training results: \nacc = {} , ce = {} , f1 score = {} , mcc = {}\ncm = \n{}\n'
                  '####################################################################################################'
                  .format(fold + 1, k_folds, model_mode, instance_end - instance_start, acc, ce, f1s, mcc, cm))
        else:
            print('\nFor LOOCV run {} out of {}. Model is {}. Time taken for instance = {}\n'
                  'Post-training results: \nacc = {} , ce = {} , f1 score = {} , mcc = {}\ncm = \n{}\n'
                  '####################################################################################################'
                  .format(fold + 1, fl.count, model_mode, instance_end - instance_start, acc, ce, f1s, mcc, cm))
        # Preparing output dataframe that consists of all the validation dataset and its predicted labels
        folds.extend([fold] * i_ss_fl.count)  # Make a col that contains the fold number for each example
        val_features_c = np.concatenate((val_features_c, i_ss_fl.features_c_a),
                                        axis=0) if val_features_c != [] else i_ss_fl.features_c_a
        val_labels.extend(i_ss_fl.labels)
        K.clear_session()

还有密集神经网络的退出代码.

And the exit code for a dense neural network.

For LOOCV run 4 out of 32. Model is cDNN. Time taken for instance = 0.7919328212738037
Post-training results: 
acc = 0.0 , ce = 0.7419472336769104 , f1 score = 0.0 , mcc = 0.0
cm = 
[[0 1]
 [0 0]]
####################################################################################################
Point 1
Point 2

Process finished with exit code -1073741819 (0xC0000005)

非常感谢您的帮助!

推荐答案

下面是我在有效的注释中建议的内容的解释,以防万一有人面临相同的问题.

Below is the explanation for the things I suggested in the comments that worked, in case anyone faces the same issue.

手动为keras设置会话,而不是在每个循环开始时使用默认会话.

Manually setting session for keras rather than using the default one at the start of each loop.

sess = tf.Session()  
K.set_session(sess) 
#..... train your model
K.clear_session()

删除loader变量作为该对象必须引用原始的model对象,因为我看到您正在对其调用train().

Deleting loader variable as this object must be having reference to the original model object as I can see you are calling the train() on it.

在每次循环后使用gc.collect()删除这些变量来显式收集释放的所有内存,以便我们有足够的内存来构建新模型.

Explicitly collecting all the memory released by deleting these the variable using gc.collect() after each loop so that we have enough memory for building our new model.

因此,要点是在这样的循环中运行多个独立模型时,请确保已明确设置了tensorflow会话,以便可以在循环结束后清除此会话,从而释放该会话使用的所有资源.删除该循环中可能与tensorflow对象绑定的所有引用,并收集可用内存.

So, the gist is when running multiple independent model in a loop like this make sure you have explicitly set the tensorflow session so that you can clear this session after loop finishes, releasing all the resources uses by this session. Delete all the references that might be tied to tensorflow objects in that loop and collect the free memory.

这篇关于运行训练2模型后,keras退出代码-1073741819(0xC0000005)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆