有状态的LSTM和流预测 [英] Stateful LSTM and stream predictions
问题描述
我已经在多批次的7个样本(每个样本具有3个特征)上训练了LSTM模型(使用Keras和TF构建),其形状类似于以下样本(以下数字仅是占位符,用于解释),每批次标记为0或1:
I've trained an LSTM model (built with Keras and TF) on multiple batches of 7 samples with 3 features each, with a shape the like below sample (numbers below are just placeholders for the purpose of explanation), each batch is labeled 0 or 1:
数据:
[
[[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3]]
[[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3]]
[[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3]]
...
]
即:一批m个序列,每个序列的长度为7,其元素为3维向量(所以该批具有形状(m * 7 * 3))
i.e: batches of m sequences, each of length 7, whose elements are 3-dimensional vectors (so batch has shape (m*7*3))
目标:
[
[1]
[0]
[1]
...
]
在我的生产环境中,数据是具有3个特征([1,2,3],[1,2,3]...
)的样本流.我想在每个样本到达模型时流式传输并获得中间概率,而不必等待整个批次(7)-请参见下面的动画.
On my production environment data is a stream of samples with 3 features ([1,2,3],[1,2,3]...
). I would like to stream each sample as it arrives to my model and get the intermediate probability without waiting for the entire batch (7) - see the animation below.
我的想法之一是将缺少的样本填充到批次中,
[[0,0,0],[0,0,0],[0,0,0],[0,0,0],[0,0,0],[0,0,0],[1,2,3]]
,但这似乎效率低下.
One of my thoughts was padding the batch with 0 for the missing samples,
[[0,0,0],[0,0,0],[0,0,0],[0,0,0],[0,0,0],[0,0,0],[1,2,3]]
but that seems to be inefficient.
我们将不胜感激,这将为我提供一个正确的方向,既可以持续保存LSTM中间状态,又可以等待下一个示例,并可以预测使用部分数据在特定批处理量下训练的模型.
Will appreciate any help that will point me in the right direction of both saving the LSTM intermediate state in a persistent way, while waiting for the next sample and predicting on a model trained on a specific batch size with partial data.
更新,包括型号代码:
opt = optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=10e-8, decay=0.001)
model = Sequential()
num_features = data.shape[2]
num_samples = data.shape[1]
first_lstm = LSTM(32, batch_input_shape=(None, num_samples, num_features), return_sequences=True, activation='tanh')
model.add(
first_lstm)
model.add(LeakyReLU())
model.add(Dropout(0.2))
model.add(LSTM(16, return_sequences=True, activation='tanh'))
model.add(Dropout(0.2))
model.add(LeakyReLU())
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer=opt,
metrics=['accuracy', keras_metrics.precision(), keras_metrics.recall(), f1])
模型摘要:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_1 (LSTM) (None, 100, 32) 6272
_________________________________________________________________
leaky_re_lu_1 (LeakyReLU) (None, 100, 32) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 100, 32) 0
_________________________________________________________________
lstm_2 (LSTM) (None, 100, 16) 3136
_________________________________________________________________
dropout_2 (Dropout) (None, 100, 16) 0
_________________________________________________________________
leaky_re_lu_2 (LeakyReLU) (None, 100, 16) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 1600) 0
_________________________________________________________________
dense_1 (Dense) (None, 1) 1601
=================================================================
Total params: 11,009
Trainable params: 11,009
Non-trainable params: 0
_________________________________________________________________
推荐答案
我认为可能会有一个更简单的解决方案.
I think there might be an easier solution.
如果您的模型没有卷积层或作用于长度/步长维度的任何其他层,则只需将其标记为stateful=True
If your model does not have convolutional layers or any other layers that act upon the length/steps dimension, you can simply mark it as stateful=True
Flatten
层将长度尺寸转换为特征尺寸.这将完全阻止您实现目标.如果Flatten
层需要7个步骤,则始终需要7个步骤.
The Flatten
layer transforms the length dimension into a feature dimension. This will completely prevent you from achieving your goal. If the Flatten
layer is expecting 7 steps, you will always need 7 steps.
因此,在应用下面的我的答案之前,请修复模型以不使用Flatten
层.相反,它只能删除最后一个 LSTM层的return_sequences=True
.
So, before applying my answer below, fix your model to not use the Flatten
layer. Instead, it can just remove the return_sequences=True
for the last LSTM layer.
以下代码修复了该问题,并且还准备了一些与下面的答案一起使用的内容:
The following code fixed that and also prepares a few things to be used with the answer below:
def createModel(forTraining):
#model for training, stateful=False, any batch size
if forTraining == True:
batchSize = None
stateful = False
#model for predicting, stateful=True, fixed batch size
else:
batchSize = 1
stateful = True
model = Sequential()
first_lstm = LSTM(32,
batch_input_shape=(batchSize, num_samples, num_features),
return_sequences=True, activation='tanh',
stateful=stateful)
model.add(first_lstm)
model.add(LeakyReLU())
model.add(Dropout(0.2))
#this is the last LSTM layer, use return_sequences=False
model.add(LSTM(16, return_sequences=False, stateful=stateful, activation='tanh'))
model.add(Dropout(0.2))
model.add(LeakyReLU())
#don't add a Flatten!!!
#model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))
if forTraining == True:
compileThisModel(model)
有了这个,您将能够以7个步骤进行训练,并以一个步骤进行预测.否则将无法实现.
With this, you will be able to train with 7 steps and predict with one step. Otherwise it will not be possible.
首先,再次训练此新模型,因为它没有Flatten层:
First, train this new model again, because it has no Flatten layer:
trainingModel = createModel(forTraining=True)
trainThisModel(trainingModel)
现在,使用此经过训练的模型,您可以完全像创建经过训练的模型一样创建一个新模型,但要在其所有LSTM层中标记stateful=True
.我们应该从训练好的模型中复制权重.
Now, with this trained model, you can simply create a new model exactly the same way you created the trained model, but marking stateful=True
in all its LSTM layers. And we should copy the weights from the trained model.
由于这些新层将需要固定的批处理大小(Keras的规则),因此我假设它将为1(一个单一的数据流,而不是m个数据流),并将其添加到上面的模型创建中.
Since these new layers will need a fixed batch size (Keras' rules), I assumed it would be 1 (one single stream is coming, not m streams) and added it to the model creation above.
predictingModel = createModel(forTraining=False)
predictingModel.set_weights(trainingModel.get_weights())
瞧瞧.只需一步就可以预测模型的输出:
And voilà. Just predict the outputs of the model with a single step:
pseudo for loop as samples arrive to your model:
prob = predictingModel.predict_on_batch(sample)
#where sample.shape == (1, 1, 3)
当您确定到达连续序列的末尾时,请调用predictingModel.reset_states()
,这样您就可以安全地启动一个新序列,而无需模型认为应该在前一个序列的末尾进行修改.
When you decide that you reached the end of what you consider a continuous sequence, call predictingModel.reset_states()
so you can safely start a new sequence without the model thinking it should be mended at the end of the previous one.
只需获取并设置它们,并使用h5py保存:
Just get and set them, saving with h5py:
def saveStates(model, saveName):
f = h5py.File(saveName,'w')
for l, lay in enumerate(model.layers):
#if you have nested models,
#consider making this recurrent testing for layers in layers
if isinstance(lay,RNN):
for s, stat in enumerate(lay.states):
f.create_dataset('states_' + str(l) + '_' + str(s),
data=K.eval(stat),
dtype=K.dtype(stat))
f.close()
def loadStates(model, saveName):
f = h5py.File(saveName, 'r')
allStates = list(f.keys())
for stateKey in allStates:
name, layer, state = stateKey.split('_')
layer = int(layer)
state = int(state)
K.set_value(model.layers[layer].states[state], f.get(stateKey))
f.close()
保存/加载状态的工作测试
import h5py, numpy as np
from keras.layers import RNN, LSTM, Dense, Input
from keras.models import Model
import keras.backend as K
def createModel():
inp = Input(batch_shape=(1,None,3))
out = LSTM(5,return_sequences=True, stateful=True)(inp)
out = LSTM(2, stateful=True)(out)
out = Dense(1)(out)
model = Model(inp,out)
return model
def saveStates(model, saveName):
f = h5py.File(saveName,'w')
for l, lay in enumerate(model.layers):
#if you have nested models, consider making this recurrent testing for layers in layers
if isinstance(lay,RNN):
for s, stat in enumerate(lay.states):
f.create_dataset('states_' + str(l) + '_' + str(s), data=K.eval(stat), dtype=K.dtype(stat))
f.close()
def loadStates(model, saveName):
f = h5py.File(saveName, 'r')
allStates = list(f.keys())
for stateKey in allStates:
name, layer, state = stateKey.split('_')
layer = int(layer)
state = int(state)
K.set_value(model.layers[layer].states[state], f.get(stateKey))
f.close()
def printStates(model):
for l in model.layers:
#if you have nested models, consider making this recurrent testing for layers in layers
if isinstance(l,RNN):
for s in l.states:
print(K.eval(s))
model1 = createModel()
model2 = createModel()
model1.predict_on_batch(np.ones((1,5,3))) #changes model 1 states
print('model1')
printStates(model1)
print('model2')
printStates(model2)
saveStates(model1,'testStates5')
loadStates(model2,'testStates5')
print('model1')
printStates(model1)
print('model2')
printStates(model2)
关于数据方面的考虑
在您的第一个模型中(如果它是stateful=False
),它认为m
中的每个序列都是独立的,而不与其他序列相连.它还认为每个批次都包含唯一的序列.
Considerations on the aspects of the data
In your first model (if it is stateful=False
), it considers that each sequence in m
is individual and not connected to the others. It also considers that each batch contains unique sequences.
如果不是这种情况,您可能想训练有状态模型(考虑到每个序列实际上都连接到先前的序列).然后您将需要m
批处理的1个序列. -> m x (1, 7 or None, 3)
.
If this is not the case, you might want to train the stateful model instead (considering that each sequence is actually connected to the previous sequence). And then you would need m
batches of 1 sequence. -> m x (1, 7 or None, 3)
.
这篇关于有状态的LSTM和流预测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!