Keras lstm 带有用于可变长度输入的屏蔽层 [英] Keras lstm with masking layer for variable-length inputs

查看:59
本文介绍了Keras lstm 带有用于可变长度输入的屏蔽层的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道这是一个有很多问题的主题,但我找不到任何解决问题的方法.

I know this is a subject with a lot of questions but I couldn't find any solution to my problem.

我正在使用掩码层在可变长度输入上训练 LSTM 网络,但它似乎没有任何效果.

I am training a LSTM network on variable-length inputs using a masking layer but it seems that it doesn't have any effect.

输入形状 (100, 362, 24),其中 362 是最大序列长度,24 是特征数,100 是样本数(分为 75 个训练/25 个有效).

Input shape (100, 362, 24) with 362 being the maximum sequence lenght, 24 the number of features and 100 the number of samples (divided 75 train / 25 valid).

输出形状 (100, 362, 1) 后来转换为 (100, 362 - N, 1).

Output shape (100, 362, 1) transformed later to (100, 362 - N, 1).

这是我的网络的代码:

from keras import Sequential
from keras.layers import Embedding, Masking, LSTM, Lambda
import keras.backend as K


#                          O O O
#   example for N:3        | | |
#                    O O O O O O
#                    | | | | | | 
#                    O O O O O O

N = 5
y= y[:,N:,:]

x_train = x[:75]
x_test = x[75:]
y_train = y[:75]
y_test = y[75:]

model = Sequential()
model.add(Masking(mask_value=0., input_shape=(timesteps, features)))
model.add(LSTM(128, return_sequences=True))
model.add(LSTM(64, return_sequences=True))
model.add(LSTM(1, return_sequences=True))
model.add(Lambda(lambda x: x[:, N:, :]))

model.compile('adam', 'mae')

print(model.summary())
history = model.fit(x_train, y_train, 
                    epochs=3, 
                    batch_size=15, 
                    validation_data=[x_test, y_test])

我的数据在最后被填充.例子:

my data is padded at the end. example:

>> x_test[10,350]
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
   0., 0., 0., 0., 0., 0., 0.], dtype=float32)

问题是遮罩层好像没有效果.我可以看到它在训练期间打印的损失值等于我之后计算的没有掩码的损失值:

The problem is that the mask layer seems to have no effect. I can see it with the loss value being printed during training which is equal to the one without mask I calculate after:

Layer (type)                 Output Shape              Param #   
=================================================================
masking_1 (Masking)          (None, 362, 24)           0         
_________________________________________________________________
lstm_1 (LSTM)                (None, 362, 128)          78336     
_________________________________________________________________
lstm_2 (LSTM)                (None, 362, 64)           49408     
_________________________________________________________________
lstm_3 (LSTM)                (None, 362, 1)            264       
_________________________________________________________________
lambda_1 (Lambda)            (None, 357, 1)            0         
=================================================================
Total params: 128,008
Trainable params: 128,008
Non-trainable params: 0
_________________________________________________________________
None
Train on 75 samples, validate on 25 samples
Epoch 1/3
75/75 [==============================] - 8s 113ms/step - loss: 0.1711 - val_loss: 0.1814
Epoch 2/3
75/75 [==============================] - 5s 64ms/step - loss: 0.1591 - val_loss: 0.1307
Epoch 3/3
75/75 [==============================] - 5s 63ms/step - loss: 0.1057 - val_loss: 0.1034

>> from sklearn.metrics import mean_absolute_error
>> out = model.predict(x_test, batch_size=1)
>> print('wo mask', mean_absolute_error(y_test.ravel(), out.ravel()))
>> print('w mask', mean_absolute_error(y_test[~(x_test[:,N:] == 0).all(axis=2)].ravel(), out[~(x_test[:,N:] == 0).all(axis=2)].ravel()))
wo mask 0.10343371
w mask 0.16236152

此外,如果我使用 nan 值作为掩码输出值,我可以看到 nan 在训练期间被传播(损失等于 nan).

Futhermore, if I use nan value for the masked output values, I can see the nan being propagated during training (loss equals nan).

我缺少什么才能使遮罩层按预期工作?

What am I missing to make the masking layer work as expected?

推荐答案

Lambda 层默认不传播掩码.换句话说,Masking 层计算的掩码张量被 Lambda 层扔掉,因此 Masking 层对输出损失.

The Lambda layer, by default, does not propagate masks. In other words, the mask tensor computed by the Masking layer is thrown away by the Lambda layer, and thus the Masking layer has no effect on the output loss.

如果您希望 Lambda 层的 compute_mask 方法传播先前的掩码,则必须在该层为创建.从Lambda层的源码可以看出,

If you want the compute_mask method of a Lambda layer to propagate previous mask, you have to provide the mask argument when the layer is created. As can be seen from the source code of Lambda layer,

def __init__(self, function, output_shape=None,
             mask=None, arguments=None, **kwargs):
    # ...
    if mask is not None:
        self.supports_masking = True
    self.mask = mask

# ...

def compute_mask(self, inputs, mask=None):
    if callable(self.mask):
        return self.mask(inputs, mask)
    return self.mask

因为mask的默认值是Nonecompute_mask返回None并且损失在全部.

Because the default value of mask is None, compute_mask returns None and the loss is not masked at all.

为了解决这个问题,因为你的 Lambda 层本身没有引入任何额外的掩码,compute_mask 方法应该只返回上一层的掩码(使用适当的切片以匹配图层的输出形状).

To fix the problem, since your Lambda layer itself does not introduce any additional masking, the compute_mask method should just return the mask from the previous layer (with appropriate slicing to match the output shape of the layer).

masking_func = lambda inputs, previous_mask: previous_mask[:, N:]
model = Sequential()
model.add(Masking(mask_value=0., input_shape=(timesteps, features)))
model.add(LSTM(128, return_sequences=True))
model.add(LSTM(64, return_sequences=True))
model.add(LSTM(1, return_sequences=True))
model.add(Lambda(lambda x: x[:, N:, :], mask=masking_func))

现在您应该能够看到正确的损失值.

Now you should be able to see the correct loss value.

>> model.evaluate(x_test, y_test, verbose=0)
0.2660679519176483
>> out = model.predict(x_test)
>> print('wo mask', mean_absolute_error(y_test.ravel(), out.ravel()))
wo mask 0.26519736809498456
>> print('w mask', mean_absolute_error(y_test[~(x_test[:,N:] == 0).all(axis=2)].ravel(), out[~(x_test[:,N:] == 0).all(axis=2)].ravel()))
w mask 0.2660679670482195

使用 NaN 值进行填充不起作用,因为掩码是通过将损失张量与二进制掩码相乘来完成的(0 * nan 仍然是 nan,因此平均值将是 nan).

Using NaN value for padding does not work because masking is done by multiplying the loss tensor with a binary mask (0 * nan is still nan, so the mean value would be nan).

这篇关于Keras lstm 带有用于可变长度输入的屏蔽层的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆