Python TensorFlow:如何使用优化器和 import_meta_graph 重新开始训练? [英] Python TensorFlow: How to restart training with optimizer and import_meta_graph?

查看:37
本文介绍了Python TensorFlow:如何使用优化器和 import_meta_graph 重新开始训练?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试通过从停止的地方开始在 TensorFlow 中重新启动模型训练.我想使用最近添加的(我认为是 0.12+) import_meta_graph() 以便不重建图形.

I'm trying to restart a model training in TensorFlow by picking up where it left off. I'd like to use the recently added (0.12+ I think) import_meta_graph() so as to not reconstruct the graph.

我已经看到了解决方案,例如Tensorflow:如何保存/恢复模型?,但是我遇到了 AdamOptimizer 的问题,特别是我得到了一个 ValueError: cannot add op with name /Adam as that name already used 错误.这可以通过初始化来修复,但是我的模型值会被清除!

I've seen solutions for this, e.g. Tensorflow: How to save/restore a model?, but I run into issues with AdamOptimizer, specifically I get a ValueError: cannot add op with name <my weights variable name>/Adam as that name is already used error. This can be fixed by initializing, but then my model values are cleared!

那里有其他答案和一些完整示例,但它们看起来总是较旧,因此不包括较新的 import_meta_graph() 方法,或者没有非张量优化器.我能找到的最接近的问题是 tensorflow:保存和恢复会话但没有最终清晰的解决方案,示例非常复杂.

There are other answers and some full examples out there, but they always seem older and so don't include the newer import_meta_graph() approach, or don't have a non-tensor optimizer. The closest question I could find is tensorflow: saving and restoring session but there is no final clear cut solution and the example is pretty complicated.

理想情况下,我想要一个简单的可运行示例,从头开始,停止,然后再次启动.我有一些有用的东西(如下),但也想知道我是否遗漏了什么.当然,我不是唯一这样做的人吗?

Ideally I'd like a simple run-able example starting from scratch, stopping, then picking up again. I have something that works (below), but do also wonder if I'm missing something. Surely I'm not the only one doing this?

推荐答案

这是我通过阅读文档、其他类似解决方案以及反复试验得出的结论.这是一个简单的随机数据自动编码器.如果运行,然后再次运行,它将从停止的地方继续(即第一次运行的成本函数从 ~0.5 -> 0.3 秒运行开始 ~0.3).除非我遗漏了一些东西,否则所有的保存、构造函数、模型构建、add_to_collection 都是需要的,并且按照精确的顺序,但可能有更简单的方法.

Here is what I came up with from reading the docs, other similar solutions, and trial and error. It's a simple autoencoder on random data. If ran, then ran again, it will continue from where it left off (i.e. cost function on first run goes from ~0.5 -> 0.3 second run starts ~0.3). Unless I missed something, all of the saving, constructors, model building, add_to_collection there are needed and in a precise order, but there may be a simpler way.

是的,这里并不真正需要使用 import_meta_graph 加载图形,因为代码就在上面,但这是我在实际应用程序中想要的.

And yes, loading the graph with import_meta_graph isn't really needed here since the code is right above, but is what I want in my actual application.

from __future__ import print_function
import tensorflow as tf
import os
import math
import numpy as np

output_dir = "/root/Data/temp"
model_checkpoint_file_base = os.path.join(output_dir, "model.ckpt")

input_length = 10
encoded_length = 3
learning_rate = 0.001
n_epochs = 10
n_batches = 10
if not os.path.exists(model_checkpoint_file_base + ".meta"):
    print("Making new")
    brand_new = True

    x_in = tf.placeholder(tf.float32, [None, input_length], name="x_in")
    W_enc = tf.Variable(tf.random_uniform([input_length, encoded_length],
                                          -1.0 / math.sqrt(input_length),
                                          1.0 / math.sqrt(input_length)), name="W_enc")
    b_enc = tf.Variable(tf.zeros(encoded_length), name="b_enc")
    encoded = tf.nn.tanh(tf.matmul(x_in, W_enc) + b_enc, name="encoded")
    W_dec = tf.transpose(W_enc, name="W_dec")
    b_dec = tf.Variable(tf.zeros(input_length), name="b_dec")
    decoded = tf.nn.tanh(tf.matmul(encoded, W_dec) + b_dec, name="decoded")
    cost = tf.sqrt(tf.reduce_mean(tf.square(decoded - x_in)), name="cost")

    saver = tf.train.Saver()
else:
    print("Reloading existing")
    brand_new = False
    saver = tf.train.import_meta_graph(model_checkpoint_file_base + ".meta")
    g = tf.get_default_graph()
    x_in = g.get_tensor_by_name("x_in:0")
    cost = g.get_tensor_by_name("cost:0")


sess = tf.Session()
if brand_new:
    optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)
    init = tf.global_variables_initializer()
    sess.run(init)
    tf.add_to_collection("optimizer", optimizer)
else:
    saver.restore(sess, model_checkpoint_file_base)
    optimizer = tf.get_collection("optimizer")[0]

for epoch_i in range(n_epochs):
    for batch in range(n_batches):
        batch = np.random.rand(50, input_length)
        _, curr_cost = sess.run([optimizer, cost], feed_dict={x_in: batch})
        print("batch_cost:", curr_cost)
        save_path = tf.train.Saver().save(sess, model_checkpoint_file_base)

这篇关于Python TensorFlow:如何使用优化器和 import_meta_graph 重新开始训练?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆