有状态的LSTM和流预测 [英] Stateful LSTM and stream predictions

查看:98
本文介绍了有状态的LSTM和流预测的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在多批次的7个样本(每个样本具有3个特征)上训练了LSTM模型(使用Keras和TF构建),其形状类似于以下样本(以下数字仅是占位符,用于解释),每批次标记为0或1:

I've trained an LSTM model (built with Keras and TF) on multiple batches of 7 samples with 3 features each, with a shape the like below sample (numbers below are just placeholders for the purpose of explanation), each batch is labeled 0 or 1:

数据:

[
   [[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3]]
   [[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3]]
   [[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3]]
   ...
]

即:一批m个序列,每个序列的长度为7,其元素为3维向量(所以该批具有形状(m * 7 * 3))

i.e: batches of m sequences, each of length 7, whose elements are 3-dimensional vectors (so batch has shape (m*7*3))

目标:

[
   [1]
   [0]
   [1]
   ...
]

在我的生产环境中,数据是具有3个特征([1,2,3],[1,2,3]...)的样本流.我想在每个样本到达模型时流式传输并获得中间概率,而不必等待整个批次(7)-请参见下面的动画.

On my production environment data is a stream of samples with 3 features ([1,2,3],[1,2,3]...). I would like to stream each sample as it arrives to my model and get the intermediate probability without waiting for the entire batch (7) - see the animation below.

我的想法之一是将缺少的样本填充到批次中, [[0,0,0],[0,0,0],[0,0,0],[0,0,0],[0,0,0],[0,0,0],[1,2,3]],但这似乎效率低下.

One of my thoughts was padding the batch with 0 for the missing samples, [[0,0,0],[0,0,0],[0,0,0],[0,0,0],[0,0,0],[0,0,0],[1,2,3]] but that seems to be inefficient.

我们将不胜感激,这将为我提供一个正确的方向,既可以持续保存LSTM中间状态,又可以等待下一个示例,并可以预测使用部分数据在特定批处理量下训练的模型.

Will appreciate any help that will point me in the right direction of both saving the LSTM intermediate state in a persistent way, while waiting for the next sample and predicting on a model trained on a specific batch size with partial data.

更新,包括型号代码:

opt = optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=10e-8, decay=0.001)
model = Sequential()

num_features = data.shape[2]
num_samples = data.shape[1]

first_lstm = LSTM(32, batch_input_shape=(None, num_samples, num_features), return_sequences=True, activation='tanh')
model.add(
    first_lstm)
model.add(LeakyReLU())
model.add(Dropout(0.2))
model.add(LSTM(16, return_sequences=True, activation='tanh'))
model.add(Dropout(0.2))
model.add(LeakyReLU())
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy', optimizer=opt,
              metrics=['accuracy', keras_metrics.precision(), keras_metrics.recall(), f1])

模型摘要:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm_1 (LSTM)                (None, 100, 32)           6272      
_________________________________________________________________
leaky_re_lu_1 (LeakyReLU)    (None, 100, 32)           0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 100, 32)           0         
_________________________________________________________________
lstm_2 (LSTM)                (None, 100, 16)           3136      
_________________________________________________________________
dropout_2 (Dropout)          (None, 100, 16)           0         
_________________________________________________________________
leaky_re_lu_2 (LeakyReLU)    (None, 100, 16)           0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 1600)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 1601      
=================================================================
Total params: 11,009
Trainable params: 11,009
Non-trainable params: 0
_________________________________________________________________

推荐答案

我认为可能会有一个更简单的解决方案.

I think there might be an easier solution.

如果您的模型没有卷积层或作用于长度/步长维度的任何其他层,则只需将其标记为stateful=True

If your model does not have convolutional layers or any other layers that act upon the length/steps dimension, you can simply mark it as stateful=True

Flatten层将长度尺寸转换为特征尺寸.这将完全阻止您实现目标.如果Flatten层需要7个步骤,则始终需要7个步骤.

The Flatten layer transforms the length dimension into a feature dimension. This will completely prevent you from achieving your goal. If the Flatten layer is expecting 7 steps, you will always need 7 steps.

因此,在应用下面的我的答案之前,请修复模型以不使用Flatten层.相反,它只能删除最后一个 LSTM层的return_sequences=True.

So, before applying my answer below, fix your model to not use the Flatten layer. Instead, it can just remove the return_sequences=True for the last LSTM layer.

以下代码修复了该问题,并且还准备了一些与下面的答案一起使用的内容:

The following code fixed that and also prepares a few things to be used with the answer below:

def createModel(forTraining):

    #model for training, stateful=False, any batch size   
    if forTraining == True:
        batchSize = None
        stateful = False

    #model for predicting, stateful=True, fixed batch size
    else:
        batchSize = 1
        stateful = True

    model = Sequential()

    first_lstm = LSTM(32, 
        batch_input_shape=(batchSize, num_samples, num_features), 
        return_sequences=True, activation='tanh', 
        stateful=stateful)   

    model.add(first_lstm)
    model.add(LeakyReLU())
    model.add(Dropout(0.2))

    #this is the last LSTM layer, use return_sequences=False
    model.add(LSTM(16, return_sequences=False, stateful=stateful,  activation='tanh'))

    model.add(Dropout(0.2))
    model.add(LeakyReLU())

    #don't add a Flatten!!!
    #model.add(Flatten())

    model.add(Dense(1, activation='sigmoid'))

    if forTraining == True:
        compileThisModel(model)

有了这个,您将能够以7个步骤进行训练,并以一个步骤进行预测.否则将无法实现.

With this, you will be able to train with 7 steps and predict with one step. Otherwise it will not be possible.

首先,再次训练此新模型,因为它没有Flatten层:

First, train this new model again, because it has no Flatten layer:

trainingModel = createModel(forTraining=True)
trainThisModel(trainingModel)

现在,使用此经过训练的模型,您可以完全像创建经过训练的模型一样创建一个新模型,但要在其所有LSTM层中标记stateful=True.我们应该从训练好的模型中复制权重.

Now, with this trained model, you can simply create a new model exactly the same way you created the trained model, but marking stateful=True in all its LSTM layers. And we should copy the weights from the trained model.

由于这些新层将需要固定的批处理大小(Keras的规则),因此我假设它将为1(一个单一的数据流,而不是m个数据流),并将其添加到上面的模型创建中.

Since these new layers will need a fixed batch size (Keras' rules), I assumed it would be 1 (one single stream is coming, not m streams) and added it to the model creation above.

predictingModel = createModel(forTraining=False)
predictingModel.set_weights(trainingModel.get_weights())

瞧瞧.只需一步就可以预测模型的输出:

And voilà. Just predict the outputs of the model with a single step:

pseudo for loop as samples arrive to your model:
    prob = predictingModel.predict_on_batch(sample)

    #where sample.shape == (1, 1, 3)

当您确定到达连续序列的末尾时,请调用predictingModel.reset_states(),这样您就可以安全地启动一个新序列,而无需模型认为应该在前一个序列的末尾进行修改.

When you decide that you reached the end of what you consider a continuous sequence, call predictingModel.reset_states() so you can safely start a new sequence without the model thinking it should be mended at the end of the previous one.

只需获取并设置它们,并使用h5py保存:

Just get and set them, saving with h5py:

def saveStates(model, saveName):

    f = h5py.File(saveName,'w')

    for l, lay in enumerate(model.layers):
        #if you have nested models, 
            #consider making this recurrent testing for layers in layers
        if isinstance(lay,RNN):
            for s, stat in enumerate(lay.states):
                f.create_dataset('states_' + str(l) + '_' + str(s),
                                 data=K.eval(stat), 
                                 dtype=K.dtype(stat))

    f.close()


def loadStates(model, saveName):

    f = h5py.File(saveName, 'r')
    allStates = list(f.keys())

    for stateKey in allStates:
        name, layer, state = stateKey.split('_')
        layer = int(layer)
        state = int(state)

        K.set_value(model.layers[layer].states[state], f.get(stateKey))

    f.close()

保存/加载状态的工作测试

import h5py, numpy as np
from keras.layers import RNN, LSTM, Dense, Input
from keras.models import Model
import keras.backend as K




def createModel():
    inp = Input(batch_shape=(1,None,3))
    out = LSTM(5,return_sequences=True, stateful=True)(inp)
    out = LSTM(2, stateful=True)(out)
    out = Dense(1)(out)
    model = Model(inp,out)
    return model


def saveStates(model, saveName):

    f = h5py.File(saveName,'w')

    for l, lay in enumerate(model.layers):
        #if you have nested models, consider making this recurrent testing for layers in layers
        if isinstance(lay,RNN):
            for s, stat in enumerate(lay.states):
                f.create_dataset('states_' + str(l) + '_' + str(s), data=K.eval(stat), dtype=K.dtype(stat))

    f.close()


def loadStates(model, saveName):

    f = h5py.File(saveName, 'r')
    allStates = list(f.keys())

    for stateKey in allStates:
        name, layer, state = stateKey.split('_')
        layer = int(layer)
        state = int(state)

        K.set_value(model.layers[layer].states[state], f.get(stateKey))

    f.close()

def printStates(model):

    for l in model.layers:
        #if you have nested models, consider making this recurrent testing for layers in layers
        if isinstance(l,RNN):
            for s in l.states:
                print(K.eval(s))   

model1 = createModel()
model2 = createModel()
model1.predict_on_batch(np.ones((1,5,3))) #changes model 1 states

print('model1')
printStates(model1)
print('model2')
printStates(model2)

saveStates(model1,'testStates5')
loadStates(model2,'testStates5')

print('model1')
printStates(model1)
print('model2')
printStates(model2)

关于数据方面的考虑

在您的第一个模型中(如果它是stateful=False),它认为m中的每个序列都是独立的,而不与其他序列相连.它还认为每个批次都包含唯一的序列.

Considerations on the aspects of the data

In your first model (if it is stateful=False), it considers that each sequence in m is individual and not connected to the others. It also considers that each batch contains unique sequences.

如果不是这种情况,您可能想训练有状态模型(考虑到每个序列实际上都连接到先前的序列).然后您将需要m批处理的1个序列. -> m x (1, 7 or None, 3).

If this is not the case, you might want to train the stateful model instead (considering that each sequence is actually connected to the previous sequence). And then you would need m batches of 1 sequence. -> m x (1, 7 or None, 3).

这篇关于有状态的LSTM和流预测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆