在 Tensorflow 中保存 AdaGrad 算法的状态 [英] Saving the state of the AdaGrad algorithm in Tensorflow

查看:37
本文介绍了在 Tensorflow 中保存 AdaGrad 算法的状态的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试训练 word2vec 模型,并希望将嵌入用于另一个应用程序.由于稍后可能会有额外的数据,并且我的计算机在训练时很慢,我希望我的脚本停止并稍后继续训练.

I am trying to train a word2vec model, and want to use the embeddings for another application. As there might be extra data later, and my computer is slow when training, I would like my script to stop and resume training later.

为此,我创建了一个保护程序:

To do this, I created a saver:

saver = tf.train.Saver({"embeddings": embeddings,"embeddings_softmax_weights":softmax_weights,"embeddings_softmax_biases":softmax_biases})

我保存了嵌入、softmax 权重和偏差,以便我以后可以继续训练.(我认为这是正确的方法,但如果我错了,请纠正我).

I save the embeddings, and softmax weights and biases so I can resume training later. (I assume that this is the correct way, but please correct me if I'm wrong).

不幸的是,当使用此脚本恢复训练时,平均损失似乎再次上升.

Unfortunately when resuming training with this script the average loss seems to go up again.

我的想法是这可以归因于我使用的 AdaGradOptimizer.最初外积矩阵可能会设置为全零,在我的训练之后它将被填充(导致较低的学习率).

My idea is that this can be attributed to the AdaGradOptimizer I'm using. Initially the outer product matrix will probably be set to all zero's, where after my training it will be filled (leading to a lower learning rate).

有没有办法保存优化器状态以便以后继续学习?

Is there a way to save the optimizer state to resume learning later?

推荐答案

虽然当您尝试直接序列化优化器对象时 TensorFlow 似乎会抱怨(例如通过 tf.add_to_collection("optimizers", optimizer) 和对 tf.train.Saver().save()) 的后续调用,您可以保存和恢复从优化器派生的训练更新操作:

While TensorFlow seems to complain when you attempt to serialize an optimizer object directly (e.g. via tf.add_to_collection("optimizers", optimizer) and a subsequent call to tf.train.Saver().save()), you can save and restore the training update operation which is derived from the optimizer:

# init
if not load_model:
    optimizer = tf.train.AdamOptimizer(1e-4)
    train_step = optimizer.minimize(loss)
    tf.add_to_collection("train_step", train_step)
else:
    saver = tf.train.import_meta_graph(modelfile+ '.meta')
    saver.restore(sess, tf.train.latest_checkpoint('./'))
    train_step = tf.get_collection("train_step")[0]

# training loop
while training:
    if iteration % save_interval == 0:
        saver = tf.train.Saver()
        save_path = saver.save(sess, filepath)

我不知道获取或设置特定于现有优化器的参数的方法,所以我没有直接的方法来验证优化器的内部状态是否已恢复,但训练恢复时的损失和准确性与快照已创建.我还建议使用对 Saver() 的无参数调用,以便未特别提及的状态变量仍将被保存,尽管这可能不是绝对必要的.

I do not know of a way to get or set the parameters specific to an existing optimizer, so I do not have a direct way of verifying that the optimizer's internal state was restored, but training resumes with loss and accuracy comparable to when the snapshot was created. I would also recommend using the parameterless call to Saver() so that state variables not specifically mentioned will still be saved, although this might not be strictly necessary.

您可能还希望保存迭代或纪元号以供以后恢复,如本示例中所述:http://www.seaandsailor.com/tensorflow-checkpointing.html

You may also wish to save the iteration or epoch number for later restoring, as detailed in this example: http://www.seaandsailor.com/tensorflow-checkpointing.html

这篇关于在 Tensorflow 中保存 AdaGrad 算法的状态的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆