LSTM-对部分序列进行预测 [英] LSTM - Making predictions on partial sequence

查看:80
本文介绍了LSTM-对部分序列进行预测的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此问题仍在继续我已经问过的上一个问题. /p>

我已经训练了LSTM模型来预测100个具有3个特征的样本的批次的二进制类别(1或0),即:数据的形状为(m,100,3),其中m是批次数.

数据:

[
    [[1,2,3],[1,2,3]... 100 sampels],
    [[1,2,3],[1,2,3]... 100 sampels],
    ... avaialble batches in the training data
]

目标:

[
   [1]
   [0]
   ...
]

型号代码:

def build_model(num_samples, num_features, is_training):
    model = Sequential()
    opt = optimizers.Adam(lr=0.0005, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0001)

    batch_size = None if is_training else 1
    stateful = False if is_training else True
    first_lstm = LSTM(32, batch_input_shape=(batch_size, num_samples, num_features), return_sequences=True,
                      activation='tanh', stateful=stateful)

    model.add(first_lstm)
    model.add(LeakyReLU())
    model.add(Dropout(0.2))
    model.add(LSTM(16, return_sequences=True, activation='tanh', stateful=stateful))
    model.add(Dropout(0.2))
    model.add(LeakyReLU())
    model.add(LSTM(8, return_sequences=False, activation='tanh', stateful=stateful))
    model.add(LeakyReLU())
    model.add(Dense(1, activation='sigmoid'))

    if is_training:
        model.compile(loss='binary_crossentropy', optimizer=opt,
                      metrics=['accuracy', keras_metrics.precision(), keras_metrics.recall(), f1])
    return model

在训练阶段,模型为有状态.当预测我正在使用有状态模型时,遍历数据并为每个样本输出概率:

for index, row in data.iterrows():
    if index % 100 == 0:
        predicting_model.reset_states()
    vals = np.array([[row[['a', 'b', 'c']].values]])
    prob = predicting_model.predict_on_batch(vals)

查看批处理结束时的概率时,它恰好是我对整个批处理进行预测时得到的值(不是一个接一个).但是,我期望当新样本到达时,概率始终会朝着正确的方向继续.实际发生的是,在任意样本上,概率输出可能会飙升至错误的类别(请参见下文).


预测期间100个批次的两个样本(标签= 1):

且Label = 0:

是否有办法实现我想要的目标(在预测概率的同时避免极端峰值),或者这是给定的事实?

任何解释,建议将不胜感激.


更新 感谢@today的建议,我已经尝试在最后一个LSTM层上使用return_sequence=True在每个输入时间步长使用隐藏状态输出来训练网络.

所以现在标签看起来像这样(形状(100,100)):

[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
...]

模型摘要:

Layer (type)                 Output Shape              Param #   
=================================================================
lstm_1 (LSTM)                (None, 100, 32)           4608      
_________________________________________________________________
leaky_re_lu_1 (LeakyReLU)    (None, 100, 32)           0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 100, 32)           0         
_________________________________________________________________
lstm_2 (LSTM)                (None, 100, 16)           3136      
_________________________________________________________________
dropout_2 (Dropout)          (None, 100, 16)           0         
_________________________________________________________________
leaky_re_lu_2 (LeakyReLU)    (None, 100, 16)           0         
_________________________________________________________________
lstm_3 (LSTM)                (None, 100, 8)            800       
_________________________________________________________________
leaky_re_lu_3 (LeakyReLU)    (None, 100, 8)            0         
_________________________________________________________________
dense_1 (Dense)              (None, 100, 1)            9         
=================================================================
Total params: 8,553
Trainable params: 8,553
Non-trainable params: 0
_________________________________________________________________

但是,我得到一个例外:

ValueError: Error when checking target: expected dense_1 to have 3 dimensions, but got array with shape (75, 100)

我需要解决什么?

解决方案

注意:这只是一个主意,可能是错误的.如有需要,请尝试一下,我们将不胜感激.


有没有一种方法可以实现我想要的(避免极端峰值,而 预测概率),还是给定的事实?

您可以执行以下实验实验:将最后一个LSTM层的return_sequences参数设置为True,并复制每个样本的标签,使其与每个样本的长度一样长.例如,如果一个样本的长度为100,并且其标签为0,则为此样本创建一个新的标签,该标签包含100个零(您可以使用np.repeat之类的numpy函数轻松地做到这一点).然后重新训练您的新模型,然后在新样本上对其进行测试.我不确定这一点,但我希望这次会出现更多单调增加/减少的概率图.


更新:您提到的错误是由于标签应为3D数组(请查看模型摘要中最后一层的输出形状)引起的.使用np.expand_dims将另一个大小为1的轴添加到末尾.假设y_train的形状为(num_samples,):

,重复标签的正确方法如下所示:

rep_y_train = np.repeat(y_train, num_reps).reshape(-1, num_reps, 1)


关于IMDB数据集的实验:

实际上,我使用带有一个LSTM层的简单模型尝试了IMDB数据集上建议的实验.一次,我每个样本仅使用一个标签(就像@Shlomi的原始方法一样),另一次,我复制标签以在每个样本的每个时间步使用一个标签 strong>(如上所述).如果您想自己尝试,请参见以下代码:

from keras.layers import *
from keras.models import Sequential, Model
from keras.datasets import imdb
from keras.preprocessing.sequence import pad_sequences
import numpy as np

vocab_size = 10000
max_len = 200
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=vocab_size)
X_train = pad_sequences(x_train, maxlen=max_len)

def create_model(return_seq=False, stateful=False):
    batch_size = 1 if stateful else None
    model = Sequential()
    model.add(Embedding(vocab_size, 128, batch_input_shape=(batch_size, None)))
    model.add(CuDNNLSTM(64, return_sequences=return_seq, stateful=stateful))
    model.add(Dense(1, activation='sigmoid'))

    model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])
    return model

# train model with one label per sample
train_model = create_model()
train_model.fit(X_train, y_train, epochs=10, batch_size=128, validation_split=0.3)

# replicate the labels
y_train_rep = np.repeat(y_train, max_len).reshape(-1, max_len, 1)

# train model with one label per timestep
rep_train_model = create_model(True)
rep_train_model.fit(X_train, y_train_rep, epochs=10, batch_size=128, validation_split=0.3)

然后,我们可以创建训练模型的有状态副本,并在一些测试数据上运行它们以比较其结果:

# replica of `train_model` with the same weights
test_model = create_model(False, True)
test_model.set_weights(train_model.get_weights())
test_model.reset_states()

# replica of `rep_train_model` with the same weights
rep_test_model = create_model(True, True)
rep_test_model.set_weights(rep_train_model.get_weights())
rep_test_model.reset_states()

def stateful_predict(model, samples):
    preds = []
    for s in samples:
        model.reset_states()
        ps = []
        for ts in s:
            p = model.predict(np.array([[ts]]))
            ps.append(p[0,0])
        preds.append(list(ps))
    return preds

X_test = pad_sequences(x_test, maxlen=max_len)

实际上,X_test的第一个样本具有0标签(即属于否定类),而X_test的第二个样本具有1标签(即属于正类).因此,让我们首先看看这两个样本的test_model(即每个样本使用一个标签训练的状态预测)的状态预测是什么样的:

import matplotlib.pyplot as plt

preds = stateful_predict(test_model, X_test[0:2])

plt.plot(preds[0])
plt.plot(preds[1])
plt.legend(['Class 0', 'Class 1'])

结果:

在结束时(即时间步长200)正确标记(即概率),但之间非常尖锐并且在之间波动.现在,我们将其与rep_test_model的状态预测(即每个时间步使用一个标签训练的预测)进行比较:

preds = stateful_predict(rep_test_model, X_test[0:2])

plt.plot(preds[0])
plt.plot(preds[1])
plt.legend(['Class 0', 'Class 1'])

结果:

同样,最后纠正正确的标签预测,但是这次与预期的一样,具有更加平滑和单调的趋势.

请注意,这只是一个示例,因此我在这里只使用了一个非常简单的模型,其中只有一个LSTM层,我根本没有尝试对其进行调整.我猜想通过对模型进行更好的调整(例如,调整层数,每层中的单元数,使用的激活函数,优化器类型和参数等),您可能会获得更好的结果.

This question is in continue to a previous question I've asked.

I've trained an LSTM model to predict a binary class (1 or 0) for batches of 100 samples with 3 features each, i.e: the shape of the data is (m, 100, 3), where m is the number of batches.

Data:

[
    [[1,2,3],[1,2,3]... 100 sampels],
    [[1,2,3],[1,2,3]... 100 sampels],
    ... avaialble batches in the training data
]

Target:

[
   [1]
   [0]
   ...
]

Model code:

def build_model(num_samples, num_features, is_training):
    model = Sequential()
    opt = optimizers.Adam(lr=0.0005, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0001)

    batch_size = None if is_training else 1
    stateful = False if is_training else True
    first_lstm = LSTM(32, batch_input_shape=(batch_size, num_samples, num_features), return_sequences=True,
                      activation='tanh', stateful=stateful)

    model.add(first_lstm)
    model.add(LeakyReLU())
    model.add(Dropout(0.2))
    model.add(LSTM(16, return_sequences=True, activation='tanh', stateful=stateful))
    model.add(Dropout(0.2))
    model.add(LeakyReLU())
    model.add(LSTM(8, return_sequences=False, activation='tanh', stateful=stateful))
    model.add(LeakyReLU())
    model.add(Dense(1, activation='sigmoid'))

    if is_training:
        model.compile(loss='binary_crossentropy', optimizer=opt,
                      metrics=['accuracy', keras_metrics.precision(), keras_metrics.recall(), f1])
    return model

For the training stage, the model is NOT stateful. When predicting I'm using a stateful model, iterating over the data and outputting a probability for each sample:

for index, row in data.iterrows():
    if index % 100 == 0:
        predicting_model.reset_states()
    vals = np.array([[row[['a', 'b', 'c']].values]])
    prob = predicting_model.predict_on_batch(vals)

When looking at the probability at the end of a batch, it is exactly the value I get when predicting with the entire batch (not one by one). However, I've expected that the probability will always continue in the right direction when new samples arrive. What actually happens is that the probability output can spike to the wrong class on an arbitrary sample (see below).


Two samples of 100 sample batches over the time of prediction (label = 1):

and Label = 0:

Is there a way to achieve what I want (avoid extreme spikes while predicting probability), or is that a given fact?

Any explanation, advice would be appreciated.


Update Thanks to @today advice, I've tried training the network with the hidden state output for each input time step using return_sequence=True on the last LSTM layer.

So now the labels look like so (shape (100,100)):

[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
...]

the model summary:

Layer (type)                 Output Shape              Param #   
=================================================================
lstm_1 (LSTM)                (None, 100, 32)           4608      
_________________________________________________________________
leaky_re_lu_1 (LeakyReLU)    (None, 100, 32)           0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 100, 32)           0         
_________________________________________________________________
lstm_2 (LSTM)                (None, 100, 16)           3136      
_________________________________________________________________
dropout_2 (Dropout)          (None, 100, 16)           0         
_________________________________________________________________
leaky_re_lu_2 (LeakyReLU)    (None, 100, 16)           0         
_________________________________________________________________
lstm_3 (LSTM)                (None, 100, 8)            800       
_________________________________________________________________
leaky_re_lu_3 (LeakyReLU)    (None, 100, 8)            0         
_________________________________________________________________
dense_1 (Dense)              (None, 100, 1)            9         
=================================================================
Total params: 8,553
Trainable params: 8,553
Non-trainable params: 0
_________________________________________________________________

However, I get an exception:

ValueError: Error when checking target: expected dense_1 to have 3 dimensions, but got array with shape (75, 100)

What do I need to fix?

解决方案

Note: This is just an idea and it might be wrong. Try it if you would like and I would appreciate any feedback.


Is there a way to achieve what I want (avoid extreme spikes while predicting probability), or is that a given fact?

You can do this experiment: set the return_sequences argument of last LSTM layer to True and replicate the labels of each sample as much as the length of each sample. For example if a sample has a length of 100 and its label is 0, then create a new label for this sample which consists of 100 zeros (you can probably easily do this using numpy function like np.repeat). Then retrain your new model and test it on new samples afterwards. I am not sure of this, but I would expect more monotonically increasing/decreasing probability graphs this time.


Update: The error you mentioned is caused by the fact that the labels should be a 3D array (look at the output shape of last layer in the model summary). Use np.expand_dims to add another axis of size one to the end. The correct way of repeating the labels would look like this, assuming y_train has a shape of (num_samples,):

rep_y_train = np.repeat(y_train, num_reps).reshape(-1, num_reps, 1)


The experiment on IMDB dataset:

Actually, I tried the experiment suggested above on the IMDB dataset using a simple model with one LSTM layer. One time, I used only one label per each sample (as in original approach of @Shlomi) and the other time I replicated the labels to have one label per each timestep of a sample (as I suggested above). Here is the code if you would like to try it yourself:

from keras.layers import *
from keras.models import Sequential, Model
from keras.datasets import imdb
from keras.preprocessing.sequence import pad_sequences
import numpy as np

vocab_size = 10000
max_len = 200
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=vocab_size)
X_train = pad_sequences(x_train, maxlen=max_len)

def create_model(return_seq=False, stateful=False):
    batch_size = 1 if stateful else None
    model = Sequential()
    model.add(Embedding(vocab_size, 128, batch_input_shape=(batch_size, None)))
    model.add(CuDNNLSTM(64, return_sequences=return_seq, stateful=stateful))
    model.add(Dense(1, activation='sigmoid'))

    model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])
    return model

# train model with one label per sample
train_model = create_model()
train_model.fit(X_train, y_train, epochs=10, batch_size=128, validation_split=0.3)

# replicate the labels
y_train_rep = np.repeat(y_train, max_len).reshape(-1, max_len, 1)

# train model with one label per timestep
rep_train_model = create_model(True)
rep_train_model.fit(X_train, y_train_rep, epochs=10, batch_size=128, validation_split=0.3)

Then we can create the stateful replicas of the training models and run them on some test data to compare their results:

# replica of `train_model` with the same weights
test_model = create_model(False, True)
test_model.set_weights(train_model.get_weights())
test_model.reset_states()

# replica of `rep_train_model` with the same weights
rep_test_model = create_model(True, True)
rep_test_model.set_weights(rep_train_model.get_weights())
rep_test_model.reset_states()

def stateful_predict(model, samples):
    preds = []
    for s in samples:
        model.reset_states()
        ps = []
        for ts in s:
            p = model.predict(np.array([[ts]]))
            ps.append(p[0,0])
        preds.append(list(ps))
    return preds

X_test = pad_sequences(x_test, maxlen=max_len)

Actually, the first sample of X_test has a 0 label (i.e. belongs to negative class) and the second sample of X_test has a 1 label (i.e. belongs to positive class). So let's first see what the stateful prediction of test_model (i.e. the one that were trained using one label per sample) for these two samples would look like:

import matplotlib.pyplot as plt

preds = stateful_predict(test_model, X_test[0:2])

plt.plot(preds[0])
plt.plot(preds[1])
plt.legend(['Class 0', 'Class 1'])

The result:

Correct label (i.e. probability) at the end (i.e. timestep 200) but very spiky and fluctuating in between. Now let's compare it with the stateful predictions of the rep_test_model (i.e. the one that were trained using one label per each timestep):

preds = stateful_predict(rep_test_model, X_test[0:2])

plt.plot(preds[0])
plt.plot(preds[1])
plt.legend(['Class 0', 'Class 1'])

The result:

Again, correct label prediction at the end but this time with a much more smoother and monotonic trend, as expected.

Note that this was just an example for demonstration and therefore I have used a very simple model here with just one LSTM layer and I did not attempt to tune it at all. I guess with a better tuning of the model (e.g. adjusting the number of layers, number of units in each layer, activation functions used, optimizer type and parameters, etc.), you might get far better results.

这篇关于LSTM-对部分序列进行预测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆