Tensorflow将LSTM的最终状态保存在dynamic_rnn中以进行预测 [英] Tensorflow save final state of LSTM in dynamic_rnn for prediction

查看:135
本文介绍了Tensorflow将LSTM的最终状态保存在dynamic_rnn中以进行预测的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想保存LSTM的最终状态,以便在恢复模型时将其包括在内,并且可以将其用于预测.如下所述,当我使用tf.assign时,保护程序仅了解最终状态.但是,这会引发错误(也在下面说明).

I want to save the final state of my LSTM such that it's included when I restore the model and can be used for prediction. As explained below, the Saver only has knowledge of the final state when I use tf.assign. However, this throws an error (also explained below).

在培训期间,我总是将最终的LSTM状态反馈回网络,如

During training I always feed the final LSTM state back into the network, as explained in this post. Here are the important parts of the code:

构建图形时:

            self.init_state = tf.placeholder(tf.float32, [
                self.n_layers, 2, self.batch_size, self.n_hidden
            ])

            state_per_layer_list = tf.unstack(self.init_state, axis=0)

            rnn_tuple_state = tuple([
                tf.contrib.rnn.LSTMStateTuple(state_per_layer_list[idx][0],
                                              state_per_layer_list[idx][1])

                for idx in range(self.n_layers)
            ])

            outputs, self.final_state = tf.nn.dynamic_rnn(
                cell, inputs=self.inputs, initial_state=rnn_tuple_state)

在培训期间:

        _current_state = np.zeros((self.n_layers, 2, self.batch_size,
                                   self.n_hidden))

            _train_step, _current_state, _loss, _acc, summary = self.sess.run(
                [
                    self.train_step, self.final_state,
                    self.merged
                ],
                feed_dict={self.inputs: _inputs,
                           self.labels:_labels, 
                           self.init_state: _current_state})

以后我从检查点还原模型时,最终状态也不会还原.如这篇文章问题在于,保护程序不了解新状态.该帖子还提出了基于tf.assign的解决方案.遗憾的是,我无法使用建议的

When I later restore my model from a checkpoint, the final state is not restored as well. As outlined in this post the problem is that the Saver has no knowledge of the new state. The post also suggests a solution, based on tf.assign. Regrettably, I cannot use the suggested

            assign_op = tf.assign(self.init_state, _current_state)
            self.sess.run(assign_op)

因为self.init状态不是变量而是占位符.我收到错误

because self.init state is not a Variable but a placeholder. I get the error

AttributeError:"Tensor"对象没有属性"assign"

AttributeError: 'Tensor' object has no attribute 'assign'

我已经尝试解决这个问题了几个小时,但是我无法使它正常工作.

I have tried to solve this problem for several hours now but I can't get it to work.

感谢您的帮助!

我已将self.init_state更改为

I have changed self.init_state to

            self.init_state = tf.get_variable('saved_state', shape=
            [self.n_layers, 2, self.batch_size, self.n_hidden])

            state_per_layer_list = tf.unstack(self.init_state, axis=0)

            rnn_tuple_state = tuple([
                tf.contrib.rnn.LSTMStateTuple(state_per_layer_list[idx][0],
                                              state_per_layer_list[idx][1])

                for idx in range(self.n_layers)
            ])

            outputs, self.final_state = tf.nn.dynamic_rnn(
                cell, inputs=self.inputs, initial_state=rnn_tuple_state)

在训练过程中,我没有输入self.init_state的值:

And during training I don't feed a value for self.init_state:

            _train_step, _current_state, _loss, _acc, summary = self.sess.run(
                [
                    self.train_step, self.final_state,
                    self.merged
                ],
                feed_dict={self.inputs: _inputs,
                           self.labels:_labels})

但是,我仍然无法运行作业op.知道我得到

However, I still can't run the assignment op. Know I get

TypeError:预期的float32传递给op'Assign'的参数'value',得到了(LSTMStateTuple(c = array([[0.07291573,-0.06366599,-0.23425588,...,0.05307654,

TypeError: Expected float32 passed to parameter 'value' of op 'Assign', got (LSTMStateTuple(c=array([[ 0.07291573, -0.06366599, -0.23425588, ..., 0.05307654,

推荐答案

要保存最终状态,您可以创建一个单独的TF变量,然后在保存图形之前,运行assign op分配您的最新状态到该变量,然后保存图形.您唯一需要记住的是在声明Saver之前声明该变量.否则它将不会包含在图表中.

In order to save the final state, you can create a separate TF variable, then before saving the graph, run an assign op to assign your latest state to that variable, and then save the graph. The only thing you need to keep in mind is to declare that variable BEFORE you declare the Saver; otherwise it won't be included in the graph.

这里将对此进行详细讨论,包括工作代码: TF LSTM:保存训练中的状态以进行预测会议稍后

This is discussed at great detail here, including the working code: TF LSTM: Save State from training session for prediction session later

***更新:后续问题的答案:

*** UPDATE: answers to followup questions:

您似乎正在使用BasicLSTMCellstate_is_tuple=True.我之前提到的讨论将GRUCellstate_is_tuple=False一起使用.两者之间的细节有所不同,但是总体方法可能相似,因此希望这对您有用:

It looks like you are using BasicLSTMCell, with state_is_tuple=True. The prior discussion that I referred you to used GRUCell with state_is_tuple=False. The details between the two are somewhat different, but the overall approach could be similar, so hopefully this should work for you:

在训练期间,您首先将零作为initial_state馈入dynamic_rnn,然后继续将其自身的输出作为initial_state再次馈回输入.因此,我们dynamic_rnn调用的LAST输出状态就是您想要保存以供以后使用的状态.由于它是由sess.run()调用产生的,因此实质上是一个numpy数组(不是张量,也不是占位符).因此,问题就变成了我如何将一个numpy数组与图形中的其余变量一起保存为Tensorflow变量".这就是为什么将最终状态分配给唯一目的就是这个变量的原因.

During training, you first feed zeros as initial_state into dynamic_rnn and then keep re-feeding its own output back as input as initial_state. So, the LAST output state of our dynamic_rnn call is what you want to save for later. Since it results from a sess.run() call, essentially it's a numpy array (not a tensor and not a placeholder). So the question amounts to "how do I save a numpy array as a Tensorflow variable along with the rest of the variables in the graph." That's why you assign the final state to a variable whose only purpose is that.

所以,代码是这样的:

    # GRAPH DEFINITIONS:
    state_in = tf.placeholder(tf.float32, [LAYERS, 2, None, CELL_SIZE], name='state_in')
    l = tf.unstack(state_in, axis=0)
    state_tup = tuple(
        [tf.nn.rnn_cell.LSTMStateTuple(l[idx][0], l[idx][1])
        for idx in range(NLAYERS)])
    #multicell = your BasicLSTMCell / MultiRNN definitions
    output, state_out = tf.nn.dynamic_rnn(multicell, X, dtype=tf.float32, initial_state=state_tup)

    savedState = tf.get_variable('savedState', shape=[LAYERS, 2, BATCHSIZE, CELL_SIZE])
    saver = tf.train.Saver(max_to_keep=1)

    in_state = np.zeros((LAYERS, 2, BATCHSIZE, CELL_SIZE))

    # TRAINING LOOP:
    feed_dict = {X: x, Y_: y_, batchsize: BATCHSIZE, state_in:in_state}
    _, out_state = sess.run([training_step, state_out], feed_dict=feed_dict)
    in_state = out_state

    # ONCE TRAINING IS OVER:
    assignOp = tf.assign(savedState, out_state)
    sess.run(assignOp)
    saver.save(sess, pathModel + '/my_model.ckpt')

    # RECOVERING IN A DIFFERENT PROGRAM:

    gInit = tf.global_variables_initializer().run()
    lInit = tf.local_variables_initializer().run()
    new_saver = tf.train.import_meta_graph(pathModel + 'my_model.ckpt.meta')
    new_saver.restore(sess, pathModel + 'my_model.ckpt')
    # retrieve State and get its LAST batch (latest obervarions)
    savedState = sess.run('savedState:0') # this is FULL state from training
    state = savedState[:,:,-1,:]  # -1 gets only the LAST batch of the state (latest seen observations)
    state = np.reshape(state, [state.shape[0], 2, -1, state.shape[2]]) #[LAYERS, 2, 1 (BATCH), SELL_SIZE]
    #x = .... (YOUR INPUTS)
    feed_dict = {'X:0': x, 'state_in:0':state}
    #PREDICTION LOOP:
    preds, state = sess.run(['preds:0', 'state_out:0'], feed_dict = feed_dict)
    # so now state will be re-fed into feed_dict with the next loop iteration

如前所述,这是一种改进的方法,适用于GRUCell对我来说很有效,其中state_is_tuple = False.我将其修改为尝试将BasicLSTMCellstate_is_tuple=True一起使用.它有效,但不如原始方法准确.我还不知道是因为对我来说GRU比LSTM更好还是出于其他原因.看看这是否适合您...

As mentioned, this is a modified approach of what works well for me with GRUCell, where state_is_tuple = False. I adapted it to try BasicLSTMCell with state_is_tuple=True. It works, but not as accurately as the original approach. I don't know yet whether its just because for me GRU is better than LSTM or for some other reason. See if this works for you...

还要记住,正如您在恢复和预测代码中所看到的那样,您的预测可能基于与训练循环不同的批次大小(我想批次为1?),因此您必须仔细考虑如何处理您的恢复状态-仅接受最后一批?或者是其他东西?这段代码仅采用了保存状态的最后一层(即来自训练的最新观察结果),因为这对我来说很重要...

Also keep in mind that, as you can see with the recovery and prediction code, your predictions will likely be based on a different batch size than your training loop (I guess batch of 1?) So you have to think through how to handle your recovered state -- just take the last batch? Or something else? This code takes the last layer of the saved state only (i.e. the most recent observations from training) because that's what was relevant for me...

这篇关于Tensorflow将LSTM的最终状态保存在dynamic_rnn中以进行预测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆