在Colab TPU上保存模型时非常慢 [英] Extremely slow when saving model on Colab TPU

查看:86
本文介绍了在Colab TPU上保存模型时非常慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的情况是,在Colab TPU环境下,保存模型非常慢.

my situation is that saving model is extremely slow under Colab TPU environment.

在使用 checkpoint 回调时,我首先遇到了这个问题,这导致训练停留在第一个时期的末尾.

I first encountered this issue when using checkpoint callback, which causes the training stuck at the end of the 1st epoch.

然后,我尝试进行回调,并仅使用 model.save_weights()保存模型,但没有任何改变.通过使用Colab终端,我发现5分钟的保存速度约为100k.

Then, I tried taking out callback and just save the model using model.save_weights(), but nothing has changed. By using Colab terminal, I found that the saving speed is about ~100k for 5 minutes.

Tensorflow的版本= 2.3

The version of Tensorflow = 2.3

我的模型拟合代码在这里:

My code of model fitting is here:

with tpu_strategy.scope(): # creating the model in the TPUStrategy scope means we will train the model on the TPU

    Baseline = create_model()
    checkpoint = keras.callbacks.ModelCheckpoint('baseline_{epoch:03d}.h5', 
                                 save_weights_only=True, save_freq="epoch")


    hist = model.fit(get_train_ds().repeat(), 
                steps_per_epoch = 100,
                epochs = 5,
                verbose = 1,
                callbacks = [checkpoint])

    model.save_weights("epoch-test.h5", overwrite=True)

推荐答案

我发现发生了这个问题,因为我通过书写明确切换到了图形模式

I found the issue happened because I explicitly switched to graph mode by writing

from tensorflow.python.framework.ops import disable_eager_execution
disable_eager_execution()

之前

with tpu_strategy.scope():
    model.fit(...)

尽管我仍然不了解原因,但删除 disable_eager_execution 可以解决问题.

Though I still don't understand the cause, remove disable_eager_execution solved the issue.

这篇关于在Colab TPU上保存模型时非常慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆