在TensorFlow中为LSTM设置可训练的初始状态 [英] Making a trainable initial state for an LSTM in TensorFlow

查看:685
本文介绍了在TensorFlow中为LSTM设置可训练的初始状态的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的序列太长而无法容纳在内存中,但是初始状态非常关键,因此我也想将其训练为变量.如何训练初始状态变量以在序列开始时传递,但在其余序列中继续使用输出状态?

I have a sequence which is too long to fit in memory, but the initial state is quite critical so I would like to train that as a variable too. How can I train the initial state variable to pass in at the start of the sequence, but keep using the output state for the rest of the sequence?

这是我到目前为止所得到的:

This is what I've got so far:

    cell = tf.contrib.rnn.BasicLSTMCell(num_lstm_cells, state_is_tuple=True)

    init_vars = cell.zero_state(batch_size, tf.float32)
    init_c = tf.Variable(init_vars.c, trainable=True)
    init_h = tf.Variable(init_vars.h, trainable=True)
    init_state = tf.contrib.rnn.LSTMStateTuple(init_c, init_h)

    state_vars = cell.zero_state(batch_size, tf.float32)
    state_c = tf.Variable(state_vars.c, trainable=False)
    state_h = tf.Variable(state_vars.h, trainable=False)
    state = tf.contrib.rnn.LSTMStateTuple(state_c, state_h)

    layer = tf.nn.rnn_cell.DropoutWrapper(cell, output_keep_prob=0.7)
    val, new_state = tf.nn.dynamic_rnn(layer, lstm_input, initial_state=state, dtype=tf.float32)

    with tf.control_dependencies([state[0].assign(new_state[0]), state[1].assign(new_state[1])]):
        output = tf.identity(val)

    inititalise_c = tf.assign(state[0], init_state[0])
    inititalise_h = tf.assign(state[1], init_state[1])
    initialise_state = tf.group([inititalise_c, inititalise_h])

这个想法是,我有一个可训练的初始状态变量(init_vars)和一个非可训练状态(state_vars),我通过调用initialise_state op在每个序列的开始将初始状态分配给该状态.

The idea is that I have a trainable initial state variable (init_vars), and a non-trainable state (state_vars) which I assign the initial state to at the start of each sequence by calling the initialise_state op.

我认为这不会奏效,因为init_state实际上不是培训的一部分,它只是用于复制.我该怎么办?

I don't think this will work though because the init_state isn't actually part of the training, it is just getting used for copying. How can I do this?

我已经在测试中确认,初始状态没有被训练,并且保持全0.

edit: I've confirmed in testing that the initial state is not being trained and remaining all 0's

推荐答案

我最终通过在单独的变量范围内创建初始状态变量来解决了这个问题.然后,使用Optimizer.Minimize()中的var_list可选参数,我可以指定在每个序列的开始训练初始状态.在训练了初始状态之后,我将其复制到这个单独的变量范围,并在序列的其余部分训练图.

I ended up solving this by creating an initial state variable inside a separate variable scope. Then using the var_list optional parameter in Optimizer.Minimize(), I could specify to train the initial state at the start of each sequence. After the training the initial state, I would copy it to this separate variable scope, and train the graph for the the rest of the sequence.

    with tf.variable_scope("state"):
        state_c = tf.Variable(tf.random_uniform([batch_size, num_lstm_cells], 0, 1), trainable=True)
        state_h = tf.Variable(tf.random_uniform([batch_size, num_lstm_cells], 0, 1), trainable=True)
        state = tf.contrib.rnn.LSTMStateTuple(state_c, state_h)

    with tf.variable_scope("nn"):
        layer = tf.nn.rnn_cell.DropoutWrapper(cell, output_keep_prob=0.7)
        val, new_state = tf.nn.dynamic_rnn(layer, lstm_input, initial_state=state, dtype=tf.float32)

        logits = tf.layers.dense(val, units=5, activation=tf.nn.relu)
        losses = tf.nn.softmax_cross_entropy_with_logits_v2(logits=logits, labels=targets)

    init_c = tf.Variable(tf.zeros([batch_size, num_lstm_cells]), trainable=False)
    init_h = tf.Variable(tf.zeros([batch_size, num_lstm_cells]), trainable=False)
    init_state = tf.contrib.rnn.LSTMStateTuple(init_c, init_h)

    restore_c = tf.assign(state[0], init_state[0])
    restore_h = tf.assign(state[1], init_state[1])
    restore_state = tf.group([restore_c, restore_h])

    save_c = tf.assign(init_state[0], state[0])
    save_h = tf.assign(init_state[1], state[1])
    save_state = tf.group([save_c, save_h])

    propagate_c = tf.assign(state[0], new_state[0])
    propagate_h = tf.assign(state[1], new_state[1])
    propagate_state = tf.group([propagate_c, propagate_h])

    nn_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, "nn")
    state_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, "state")

    total_loss = tf.reduce_mean(losses)

    train_nn_step = tf.train.AdamOptimizer().minimize(total_loss, var_list=nn_vars)
    train_nn_state_step = tf.train.AdamOptimizer().minimize(total_loss, var_list=[nn_vars, state_vars])

因此,您可以通过调用以下内容来开始序列:

So you start a sequence by calling:

  1. sess.run(restore_state)将初始状态复制回图形
  2. _, er = sess.run([train_nn_state_step, error])训练初始状态和nn
  3. sess.run(save_state)保存初始状态
  4. sess.run(propagate_state)将状态传播到下一个训练步骤
  1. sess.run(restore_state) to copy the initial state back to the graph
  2. _, er = sess.run([train_nn_state_step, error]) to train the initial state and nn
  3. sess.run(save_state) to save the initial state
  4. sess.run(propagate_state) to propagate the state to the next train step

然后您通过调用以下命令训练序列的其余部分:

And you train the rest of the sequence by calling:

  1. _, er = sess.run([train_nn_step, error])只是训练神经网络
  2. sess.run(propagate_state)保持状态通过
  1. _, er = sess.run([train_nn_step, error]) to just train the neural network
  2. sess.run(propagate_state) to keep passing the state through

这篇关于在TensorFlow中为LSTM设置可训练的初始状态的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆