Keras lstm,带掩膜层,用于可变长度输入 [英] Keras lstm with masking layer for variable-length inputs
问题描述
我知道这是一个有很多问题的主题,但是我找不到解决我问题的方法.
I know this is a subject with a lot of questions but I couldn't find any solution to my problem.
我正在使用遮罩层在可变长度输入上训练LSTM网络,但似乎没有任何作用.
I am training a LSTM network on variable-length inputs using a masking layer but it seems that it doesn't have any effect.
输入形状(100、362、24),其中362为最大序列长度,特征数为24,样本数为100(划分为75列/有效值为25).
Input shape (100, 362, 24) with 362 being the maximum sequence lenght, 24 the number of features and 100 the number of samples (divided 75 train / 25 valid).
输出形状(100,362,1)随后转换为(100,362-N,1).
Output shape (100, 362, 1) transformed later to (100, 362 - N, 1).
这是我的网络的代码:
from keras import Sequential
from keras.layers import Embedding, Masking, LSTM, Lambda
import keras.backend as K
# O O O
# example for N:3 | | |
# O O O O O O
# | | | | | |
# O O O O O O
N = 5
y= y[:,N:,:]
x_train = x[:75]
x_test = x[75:]
y_train = y[:75]
y_test = y[75:]
model = Sequential()
model.add(Masking(mask_value=0., input_shape=(timesteps, features)))
model.add(LSTM(128, return_sequences=True))
model.add(LSTM(64, return_sequences=True))
model.add(LSTM(1, return_sequences=True))
model.add(Lambda(lambda x: x[:, N:, :]))
model.compile('adam', 'mae')
print(model.summary())
history = model.fit(x_train, y_train,
epochs=3,
batch_size=15,
validation_data=[x_test, y_test])
我的数据最后被填充.例如:
my data is padded at the end. example:
>> x_test[10,350]
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0.], dtype=float32)
问题在于遮罩层似乎没有作用.我可以看到它在训练过程中打印出的损失值等于我计算出的没有遮罩的损失值:
The problem is that the mask layer seems to have no effect. I can see it with the loss value being printed during training which is equal to the one without mask I calculate after:
Layer (type) Output Shape Param #
=================================================================
masking_1 (Masking) (None, 362, 24) 0
_________________________________________________________________
lstm_1 (LSTM) (None, 362, 128) 78336
_________________________________________________________________
lstm_2 (LSTM) (None, 362, 64) 49408
_________________________________________________________________
lstm_3 (LSTM) (None, 362, 1) 264
_________________________________________________________________
lambda_1 (Lambda) (None, 357, 1) 0
=================================================================
Total params: 128,008
Trainable params: 128,008
Non-trainable params: 0
_________________________________________________________________
None
Train on 75 samples, validate on 25 samples
Epoch 1/3
75/75 [==============================] - 8s 113ms/step - loss: 0.1711 - val_loss: 0.1814
Epoch 2/3
75/75 [==============================] - 5s 64ms/step - loss: 0.1591 - val_loss: 0.1307
Epoch 3/3
75/75 [==============================] - 5s 63ms/step - loss: 0.1057 - val_loss: 0.1034
>> from sklearn.metrics import mean_absolute_error
>> out = model.predict(x_test, batch_size=1)
>> print('wo mask', mean_absolute_error(y_test.ravel(), out.ravel()))
>> print('w mask', mean_absolute_error(y_test[~(x_test[:,N:] == 0).all(axis=2)].ravel(), out[~(x_test[:,N:] == 0).all(axis=2)].ravel()))
wo mask 0.10343371
w mask 0.16236152
更进一步,如果我将nan值用作掩码的输出值,则可以看到nan在训练过程中正在传播(损耗等于nan).
Futhermore, if I use nan value for the masked output values, I can see the nan being propagated during training (loss equals nan).
要使遮罩层按预期工作,我缺少什么?
What am I missing to make the masking layer work as expected?
推荐答案
默认情况下,Lambda
层不传播掩码.换句话说,由Masking
层计算的掩码张量被Lambda
层丢弃,因此Masking
层对输出损耗没有影响.
The Lambda
layer, by default, does not propagate masks. In other words, the mask tensor computed by the Masking
layer is thrown away by the Lambda
layer, and thus the Masking
layer has no effect on the output loss.
如果希望Lambda
图层的compute_mask
方法传播先前的遮罩,则在创建图层时必须提供mask
参数.从Lambda
层的源代码可以看出,
If you want the compute_mask
method of a Lambda
layer to propagate previous mask, you have to provide the mask
argument when the layer is created. As can be seen from the source code of Lambda
layer,
def __init__(self, function, output_shape=None,
mask=None, arguments=None, **kwargs):
# ...
if mask is not None:
self.supports_masking = True
self.mask = mask
# ...
def compute_mask(self, inputs, mask=None):
if callable(self.mask):
return self.mask(inputs, mask)
return self.mask
因为mask
的默认值为None
,所以compute_mask
返回None
,并且完全没有掩盖损失.
Because the default value of mask
is None
, compute_mask
returns None
and the loss is not masked at all.
为解决此问题,由于您的Lambda
图层本身未引入任何其他遮罩,因此compute_mask
方法应仅返回上一层的遮罩(具有适当的切片以匹配该图层的输出形状).
To fix the problem, since your Lambda
layer itself does not introduce any additional masking, the compute_mask
method should just return the mask from the previous layer (with appropriate slicing to match the output shape of the layer).
masking_func = lambda inputs, previous_mask: previous_mask[:, N:]
model = Sequential()
model.add(Masking(mask_value=0., input_shape=(timesteps, features)))
model.add(LSTM(128, return_sequences=True))
model.add(LSTM(64, return_sequences=True))
model.add(LSTM(1, return_sequences=True))
model.add(Lambda(lambda x: x[:, N:, :], mask=masking_func))
现在您应该能够看到正确的损耗值.
Now you should be able to see the correct loss value.
>> model.evaluate(x_test, y_test, verbose=0)
0.2660679519176483
>> out = model.predict(x_test)
>> print('wo mask', mean_absolute_error(y_test.ravel(), out.ravel()))
wo mask 0.26519736809498456
>> print('w mask', mean_absolute_error(y_test[~(x_test[:,N:] == 0).all(axis=2)].ravel(), out[~(x_test[:,N:] == 0).all(axis=2)].ravel()))
w mask 0.2660679670482195
使用NaN值进行填充不起作用,因为通过将损耗张量乘以二进制掩码(0 * nan
仍为nan
,因此平均值为nan
)来完成掩码.
Using NaN value for padding does not work because masking is done by multiplying the loss tensor with a binary mask (0 * nan
is still nan
, so the mean value would be nan
).
这篇关于Keras lstm,带掩膜层,用于可变长度输入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!