在Keras中为每个具有不同隐藏大小和多个LSTM层的微型批处理设置隐藏状态 [英] Setting the hidden state for each minibatch with different hidden sizes and multiple LSTM layers in Keras
问题描述
我使用Keras和TensorFlow作为后端创建了一个LSTM.在对num_step为96的小批量进行训练之前,将LSTM的隐藏状态设置为上一个时间步的真实值.
I created an LSTM using Keras with TensorFlow as backend. Before a minibatch with a num_step of 96 is given to the training, the hidden state of the LSTM is set to true values of a previous time step.
首先是参数和数据:
batch_size = 10
num_steps = 96
num_input = num_output = 2
hidden_size = 8
X_train = np.array(X_train).reshape(-1, num_steps, num_input)
Y_train = np.array(Y_train).reshape(-1, num_steps, num_output)
X_test = np.array(X_test).reshape(-1, num_steps, num_input)
Y_test = np.array(Y_test).reshape(-1, num_steps, num_output)
Keras模型由两层LSTM层和一层将输出修整为num_output,即2:
The Keras model consists of two LSTM layers and one layer to trim the output to num_output which is 2:
model = Sequential()
model.add(LSTM(hidden_size, batch_input_shape=((batch_size, num_steps, num_input)),
return_sequences=True, stateful = True)))
model.add(LSTM(hidden_size, return_sequences=True)))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(num_output, activation='softmax')))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
生成器以及训练(hidden_states [x]的形状为(2,)):
The generator, as well as the training (hidden_states[x] has shape (2,)):
def gen_data():
x = np.zeros((batch_size, num_steps, num_input))
y = np.zeros((batch_size, num_steps, num_output))
while True:
for i in range(batch_size):
model.layers[0].states[0] = K.variable(value=hidden_states[gen_data.current_idx]) # hidden_states[x] has shape (2,)
x[i, :, :] = X_train[gen_data.current_idx]
y[i, :, :] = Y_train[gen_data.current_idx]
gen_data.current_idx += 1
yield x, y
gen_data.current_idx = 0
for epoch in range(100):
model.fit_generator(generate_data(), len(X_train)//batch_size, 1,
validation_data=None, max_queue_size=1, shuffle=False)
gen_data.current_idx = 0
此代码不会给我一个错误,但是对此我有两个问题:
This code does not give me an error, but I have two questions about it:
1)在生成器内部,将LSTM model.layers[0].states[0]
的隐藏状态设置为hidden_states[gen_data.current_idx]
上形状为(2,)的变量.
为什么对于隐藏大小大于2的LSTM可能如此?
1) Inside the generator I set the hidden state of the LSTM model.layers[0].states[0]
to a variable on hidden_states[gen_data.current_idx]
with the shape (2,).
Why is this possible for an LSTM with a hidden size greater than 2?
2)hidden_states[gen_data.current_idx]
中的值也可以是Keras模型的输出.以这种方式将两层LSTM设置为隐藏状态是否有意义?
2) The values in hidden_states[gen_data.current_idx]
could also be an output from the Keras model. Does it make sense for a two-layer LSTM to set the hidden state in this way?
推荐答案
LSTM中的状态
LSTM由计算cell state
和hidden state
的门组成.
在该图中,LSTM右侧的顶部箭头是单元状态(c_t
),底部箭头是隐藏状态(h_t
).单元状态是门控操作的结果,状态大小与LSTM的hidden_size
相同.每次展开(及其相应的输入X)都会导致其自身的单元状态.对于LSTM,单元状态由(batch_size x hidden_size)的值hidden_state(h_t
)和(batch_size x hidden_size)的cell_state(c_t
)组成.
In the figure the top arrow coming out of the right of LSTM is the cell state (c_t
) and the bottom arrow is the hidden state (h_t
). The cell states are the result of gated manipulation and the size of state is same as the hidden_size
of the LSTM. Every unrolling (with its corresponding input X) results in its own cell state. In case of LSTM, the cell state is composed of two value hidden_state(h_t
) of (batch_size x hidden_size) and cell_state (c_t
) of (batch_size x hidden_size).
batch_size = 2
num_steps = 5
num_input = num_output = 1
hidden_size = 8
inputs = Input(batch_shape=(batch_size,num_steps, num_input))
lstm, state_h, state_c = LSTM(hidden_size, return_state=True, return_sequences=True)(inputs)
model = Model(inputs=inputs, outputs=[state_h, state_c])
print (model.predict(np.zeros((batch_size, num_steps, num_input))))
print (model.layers[1].cell.state_size)
注意:如果是GRU/RNN,则没有单元状态,只有隐藏状态,因此,单元状态只有h_t
大小(batch_size,hidden_size)
Note: In case of GRU/RNN there is no cell state there is only hidden state so the cell state in case is just h_t
of size (batch_size , hidden_size)
状态张量的数量为1(对于RNN和GRU)或2(对于LSTM).
the number of state tensors is 1 (for RNN and GRU) or 2 (for LSTM).
在您的示例中,layers[0]
引用1个LSTM,而layers[1]
引用第二个LSTM.如果要初始化第n个批次的单元格状态(c_t
),从(n-1)的单元格状态开始,即前一个批次,则有两种选择
In your example the layers[0]
refers 1 LSTM and layers[1]
refer to the 2nd LSTM. If your intension is to initialise the cell state (c_t
) of the nth batch as from the cell state of of (n-1) i.e previous batch there are two options
-
在生成器中的操作方式,但是如果需要
c_t
和states[0]
作为h_t
,请使用states[1]
.类似地,对于第一个LSTM使用layers[0]
,对于第二个LSTM使用layers[1]
.但是请改用set_value
方法.请参见下面的编辑.
The way you are doing in the generator but use
states[1]
if you wantc_t
andstates[0]
forh_t
. Similarly uselayers[0]
for 1st LSTM andlayers[1]
for second LSTM. But useset_value
methods instead. See edit below.
使用keras Stateful=True
:将有状态设置为true时,每批处理后不会重置LSTM状态.因此,如果您的批处理中包含5个数据样本(每个样本具有一定的序列长度),您将获得5个数据样本中每个样本的单元格状态.将stateful设置为true时,这些状态用于初始化下一个批次的下一个批次单元格状态.
Use keras Stateful=True
: With stateful set to true the LSTM states are not reset after every batch. So If you have a batch with 5 data samples (each of some sequence length) you will get a cell state for each of the 5 data samples. With stateful set to true these states are used to initialized the next batch cell state for the next batch.
应该使用方法set_value
来设置张量变量的值.代码model.layers[0].states[0] = K.variable(value=hidden_states[gen_data.current_idx])
是有效的,因为它正在执行的操作是将指向大小变量(batch_size X hidden_size)的state [0]更改为大小变量(batch_size x 2).它不是在改变张量变量的值,而是使其指向不同维度的新张量变量.
The method set_value
should be used to set the value of a tensor variable. The code model.layers[0].states[0] = K.variable(value=hidden_states[gen_data.current_idx])
is valid because what it is doing is changing the state[0] which was pointing to a variable of size (batch_size X hidden_size) to a a variable of size (batch_size x 2). It is not changing the value of the tensor variable but rather making it point to a new tensor variable of different dimension.
测试代码:
print (model.layers[0].states[0], hex(id(model.layers[0].states[0])))
model.layers[0].states[0]= K.variable(np.random.randn(10,2))
print (model.layers[0].states[0], hex(id(model.layers[0].states[0])))
输出
<tf.Variable 'lstm_18/Variable:0' shape=(10, 8) dtype=float32_ref> 0x7f8812e6ee10
<tf.Variable 'Variable_2:0' shape=(10, 2) dtype=float32_ref> 0x7f881269afd0
如您所见,它们是两个不同的变量.正确的方法是
As you can see they are two different variable. The correct way to do this is
print (model.layers[0].states[0], hex(id(model.layers[0].states[0])))
K.set_value(model.layers[0].states[0], np.random.randn(10,8))
print (model.layers[0].states[0], hex(id(model.layers[0].states[0])))
输出
<tf.Variable 'lstm_20/Variable:0' shape=(10, 8) dtype=float32_ref> 0x7f881138eb70
<tf.Variable 'lstm_20/Variable:0' shape=(10, 8) dtype=float32_ref> 0x7f881138eb70
如果您的代码是固定的,那么
If your code is fixed then
K.set_value(model.layers[0].states[0], np.random.randn(10,2))
由于张量的大小和您设置为不匹配的值的大小,将引发错误.
Will throw an error as the size of tensor and the size of the value you are setting to do not match.
这篇关于在Keras中为每个具有不同隐藏大小和多个LSTM层的微型批处理设置隐藏状态的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!