可变行为序列的奇怪行为序列到序列学习 [英] Strange behaviour sequence to sequence learning for variable length sequences

查看:68
本文介绍了可变行为序列的奇怪行为序列到序列学习的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在用Keras训练序列到可变长度序列的序列模型,但是遇到了一些意想不到的问题.我不清楚我正在观察的行为是否是库的期望行为,以及为什么会如此.

I am training a sequence to sequence model for variable length sequences with Keras, but I am running into some unexpected problems. It is unclear to me whether the behaviour I am observing is the desired behaviour of the library and why it would be.

我制作了一个具有嵌入层和GRU循环层的循环模型,以说明该问题.我将mask_zero=0.0用于嵌入层而不是遮罩层,但是更改此设置似乎没有什么不同(在输出之前也未添加遮罩层):

I've made a recurrent model with an embeddings layer and a GRU recurrent layer that illustrates the problem. I used mask_zero=0.0 for the embeddings layer instead of a masking layer, but changing this doesn't seem to make a difference (nor does adding a masking layer before the output):

import numpy
from keras.layers import Embedding, GRU, TimeDistributed, Dense, Input
from keras.models import Model
import keras.preprocessing.sequence

numpy.random.seed(0)
input_layer = Input(shape=(3,), dtype='int32', name='input')
embeddings = Embedding(input_dim=20, output_dim=2, input_length=3, mask_zero=True, name='embeddings')(input_layer)
recurrent = GRU(5, return_sequences=True, name='GRU')(embeddings)
output_layer = TimeDistributed(Dense(1), name='output')(recurrent)
model = Model(input=input_layer, output=output_layer)
output_weights = model.layers[-1].get_weights()
output_weights[1] = numpy.array([0.2])
model.layers[-1].set_weights(output_weights)
model.compile(loss='mse', metrics=['mse'], optimizer='adam', sample_weight_mode='temporal')

我使用masking和sample_weight参数将填充值排除在训练/评估之外.我将在一个使用Keras填充功能填充的输入/输出序列上测试该模型:

I use masking and the sample_weight parameter to exclude the padding values from the training/evaluation. I will test this model on one input/output sequence which I pad using the Keras padding function:

X = [[1, 2]] 
X_padded = keras.preprocessing.sequence.pad_sequences(X, dtype='float32', maxlen=3) 
Y = [[[1], [2]]] 
Y_padded = keras.preprocessing.sequence.pad_sequences(Y, maxlen=3, dtype='float32') 

输出形状

为什么期望以这种方式格式化输出.为什么不能使用维度完全相同的输入/输出序列? model.evaluate(X_padded, Y_padded)给我一个尺寸错误.

Output Shape

Why the output is expected to be formatted in this way. Why can I not use input/output sequences that have exactly the same dimensionality? model.evaluate(X_padded, Y_padded) gives me a dimensionality error.

然后,当我运行model.predict(X_padded)时,得到以下输出(在生成模型之前使用numpy.random.seed(0)):

Then, when I run model.predict(X_padded) I get the following output (with numpy.random.seed(0) before generating the model):

[[[ 0.2       ]
  [ 0.19946882]
  [ 0.19175649]]]

为什么第一个输入没有被输出层掩盖?是否计算出了output_value(是否等于偏移量,因为隐藏层的值为0?这似乎不是所希望的.在输出层之前添加Masking层不能解决此问题.

Why isn't the first input masked for the output layer? Is the output_value computed anyways (and equal to the bias, as the hidden layer values are 0? This does not seem desirable. Adding a Masking layer before the output layer does not solve this problem.

然后,当我评估模型(model.evaluate(X_padded, Y_padded))时,这将返回整个序列(1.3168)的均方误差(MSE).包括我认为这是第一个值不会被遮盖,但不是我想要的东西.

Then, when I evaluate the model (model.evaluate(X_padded, Y_padded)), this returns the Mean Squared Error (MSE) of the entire sequence (1.3168) including this first value, which I suppose is to be expected when it isn't masked, but not what I would want.

根据Keras文档,我知道我应该使用sample_weight参数来解决此问题,我尝试过:

From the Keras documentation I understand I should use the sample_weight parameter to solve this problem, which I tried:

sample_weight = numpy.array([[0, 1, 1]])
model_evaluation = model.evaluate(X_padded, Y_padded, sample_weight=sample_weight)
print model.metrics_names, model_evaluation

我得到的输出是

['loss', 'mean_squared_error'] [2.9329459667205811, 1.3168648481369019]

这使指标(MSE)保持不变,仍然是所有所有值的MSE,包括我想要屏蔽的值.为什么?当我评估模型时,这不是我想要的.的确会导致丢失值发生变化,这似乎是最后两个值的MSE(归一化为对较长序列不赋予更多权重).

This leaves the metric (MSE) unaltered, it is still the MSE over all values, including the one that I wanted masked. Why? This is not what I want when I evaluate my model. It does cause a change in the loss value, which appears to be the MSE over the last two values normalised to not give more weight to longer sequences.

我对样品砝码做错了吗?另外,我真的不知道这个损失值是如何产生的.我应该怎么做才能从训练和评估中排除填充的值(我假设sample_weight参数在fit函数中的作用相同).

Am I doing something wrong with the sample weights? Also, I can really not figure out how this loss value came about. What should I do to exclude the padded values from both training and evaluation (I assume the sample_weight parameter works the same in the fit function).

推荐答案

这确实是库中的错误,在Keras 2中,此问题已解决.

It was indeed a bug in the library, in Keras 2 this issue is resolved.

这篇关于可变行为序列的奇怪行为序列到序列学习的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆