在Keras中为每个具有不同隐藏大小和多个LSTM层的微型批处理设置隐藏状态 [英] Setting the hidden state for each minibatch with different hidden sizes and multiple LSTM layers in Keras

查看:157
本文介绍了在Keras中为每个具有不同隐藏大小和多个LSTM层的微型批处理设置隐藏状态的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用Keras和TensorFlow作为后端创建了一个LSTM.在对num_step为96的小批量进行训练之前,将LSTM的隐藏状态设置为上一个时间步的真实值.

I created an LSTM using Keras with TensorFlow as backend. Before a minibatch with a num_step of 96 is given to the training, the hidden state of the LSTM is set to true values of a previous time step.

首先是参数和数据:

batch_size = 10
num_steps = 96
num_input = num_output = 2
hidden_size = 8
X_train = np.array(X_train).reshape(-1, num_steps, num_input)
Y_train = np.array(Y_train).reshape(-1, num_steps, num_output)
X_test = np.array(X_test).reshape(-1, num_steps, num_input)
Y_test = np.array(Y_test).reshape(-1, num_steps, num_output)

Keras模型由两层LSTM层和一层将输出修整为num_output,即2:

The Keras model consists of two LSTM layers and one layer to trim the output to num_output which is 2:

model = Sequential()
model.add(LSTM(hidden_size, batch_input_shape=((batch_size, num_steps, num_input)),
               return_sequences=True, stateful = True)))
model.add(LSTM(hidden_size, return_sequences=True)))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(num_output, activation='softmax')))

model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])

生成器以及训练(hidden_​​states [x]的形状为(2,)):

The generator, as well as the training (hidden_states[x] has shape (2,)):

def gen_data():
        x = np.zeros((batch_size, num_steps, num_input))
        y = np.zeros((batch_size, num_steps, num_output))
        while True:
            for i in range(batch_size):
                model.layers[0].states[0] = K.variable(value=hidden_states[gen_data.current_idx]) # hidden_states[x] has shape (2,)
                x[i, :, :] = X_train[gen_data.current_idx]
                y[i, :, :] = Y_train[gen_data.current_idx]
                gen_data.current_idx += 1
            yield x, y
gen_data.current_idx = 0


for epoch in range(100):
    model.fit_generator(generate_data(), len(X_train)//batch_size, 1,
                        validation_data=None, max_queue_size=1, shuffle=False)
    gen_data.current_idx = 0

此代码不会给我一个错误,但是对此我有两个问题:

This code does not give me an error, but I have two questions about it:

1)在生成器内部,将LSTM model.layers[0].states[0]的隐藏状态设置为hidden_states[gen_data.current_idx]上形状为(2,)的变量. 为什么对于隐藏大小大于2的LSTM可能如此?

1) Inside the generator I set the hidden state of the LSTM model.layers[0].states[0] to a variable on hidden_states[gen_data.current_idx] with the shape (2,). Why is this possible for an LSTM with a hidden size greater than 2?

2)hidden_states[gen_data.current_idx]中的值也可以是Keras模型的输出.以这种方式将两层LSTM设置为隐藏状态是否有意义?

2) The values in hidden_states[gen_data.current_idx] could also be an output from the Keras model. Does it make sense for a two-layer LSTM to set the hidden state in this way?

推荐答案

LSTM中的状态

LSTM由计算cell statehidden state的门组成.

在该图中,LSTM右侧的顶部箭头是单元状态(c_t),底部箭头是隐藏状态(h_t).单元状态是门控操作的结果,状态大小与LSTM的hidden_size相同.每次展开(及其相应的输入X)都会导致其自身的单元状态.对于LSTM,单元状态由(batch_size x hidden_​​size)的值hidden_​​state(h_t)和(batch_size x hidden_​​size)的cell_state(c_t)组成.

In the figure the top arrow coming out of the right of LSTM is the cell state (c_t) and the bottom arrow is the hidden state (h_t). The cell states are the result of gated manipulation and the size of state is same as the hidden_size of the LSTM. Every unrolling (with its corresponding input X) results in its own cell state. In case of LSTM, the cell state is composed of two value hidden_state(h_t) of (batch_size x hidden_size) and cell_state (c_t) of (batch_size x hidden_size).

batch_size = 2
num_steps = 5
num_input = num_output = 1
hidden_size = 8

inputs = Input(batch_shape=(batch_size,num_steps, num_input))
lstm, state_h, state_c = LSTM(hidden_size, return_state=True, return_sequences=True)(inputs)
model = Model(inputs=inputs, outputs=[state_h, state_c])

print (model.predict(np.zeros((batch_size, num_steps, num_input))))
print (model.layers[1].cell.state_size)

注意:如果是GRU/RNN,则没有单元状态,只有隐藏状态,因此,单元状态只有h_t大小(batch_size,hidden_​​size)

Note: In case of GRU/RNN there is no cell state there is only hidden state so the cell state in case is just h_t of size (batch_size , hidden_size)

LSTM

Keras文档:

状态张量的数量为1(对于RNN和GRU)或2(对于LSTM).

the number of state tensors is 1 (for RNN and GRU) or 2 (for LSTM).

LSTM和GRU

在您的示例中,layers[0]引用1个LSTM,而layers[1]引用第二个LSTM.如果要初始化第n个批次的单元格状态(c_t),从(n-1)的单元格状态开始,即前一个批次,则有两种选择

In your example the layers[0] refers 1 LSTM and layers[1] refer to the 2nd LSTM. If your intension is to initialise the cell state (c_t) of the nth batch as from the cell state of of (n-1) i.e previous batch there are two options

  • 在生成器中的操作方式,但是如果需要c_tstates[0]作为h_t,请使用states[1].类似地,对于第一个LSTM使用layers[0],对于第二个LSTM使用layers[1].但是请改用set_value方法.请参见下面的编辑.

  • The way you are doing in the generator but use states[1] if you want c_t and states[0] for h_t. Similarly use layers[0] for 1st LSTM and layers[1] for second LSTM. But use set_value methods instead. See edit below.

使用keras Stateful=True:将有状态设置为true时,每批处理后不会重置LSTM状态.因此,如果您的批处理中包含5个数据样本(每个样本具有一定的序列长度),您将获得5个数据样本中每个样本的单元格状态.将stateful设置为true时,这些状态用于初始化下一个批次的下一个批次单元格状态.

Use keras Stateful=True : With stateful set to true the LSTM states are not reset after every batch. So If you have a batch with 5 data samples (each of some sequence length) you will get a cell state for each of the 5 data samples. With stateful set to true these states are used to initialized the next batch cell state for the next batch.

应该使用方法set_value来设置张量变量的值.代码model.layers[0].states[0] = K.variable(value=hidden_states[gen_data.current_idx])是有效的,因为它正在执行的操作是将指向大小变量(batch_size X hidden_​​size)的state [0]更改为大小变量(batch_size x 2).它不是在改变张量变量的值,而是使其指向不同维度的新张量变量.

The method set_value should be used to set the value of a tensor variable. The code model.layers[0].states[0] = K.variable(value=hidden_states[gen_data.current_idx]) is valid because what it is doing is changing the state[0] which was pointing to a variable of size (batch_size X hidden_size) to a a variable of size (batch_size x 2). It is not changing the value of the tensor variable but rather making it point to a new tensor variable of different dimension.

测试代码:

 print (model.layers[0].states[0], hex(id(model.layers[0].states[0])))
 model.layers[0].states[0]= K.variable(np.random.randn(10,2))
 print (model.layers[0].states[0], hex(id(model.layers[0].states[0])))

输出

<tf.Variable 'lstm_18/Variable:0' shape=(10, 8) dtype=float32_ref> 0x7f8812e6ee10
<tf.Variable 'Variable_2:0' shape=(10, 2) dtype=float32_ref> 0x7f881269afd0

如您所见,它们是两个不同的变量.正确的方法是

As you can see they are two different variable. The correct way to do this is

 print (model.layers[0].states[0], hex(id(model.layers[0].states[0])))
 K.set_value(model.layers[0].states[0], np.random.randn(10,8))
 print (model.layers[0].states[0], hex(id(model.layers[0].states[0])))

输出

<tf.Variable 'lstm_20/Variable:0' shape=(10, 8) dtype=float32_ref> 0x7f881138eb70
<tf.Variable 'lstm_20/Variable:0' shape=(10, 8) dtype=float32_ref> 0x7f881138eb70

如果您的代码是固定的,那么

If your code is fixed then

K.set_value(model.layers[0].states[0], np.random.randn(10,2))

由于张量的大小和您设置为不匹配的值的大小,将引发错误.

Will throw an error as the size of tensor and the size of the value you are setting to do not match.

这篇关于在Keras中为每个具有不同隐藏大小和多个LSTM层的微型批处理设置隐藏状态的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆