可变行为序列的奇怪行为序列到序列学习 [英] Strange behaviour sequence to sequence learning for variable length sequences

查看：68 发布时间：2020/4/25 10:49:29 keras masking recurrent-neural-network

本文介绍了可变行为序列的奇怪行为序列到序列学习的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在用Keras训练序列到可变长度序列的序列模型，但是遇到了一些意想不到的问题.我不清楚我正在观察的行为是否是库的期望行为，以及为什么会如此.

I am training a sequence to sequence model for variable length sequences with Keras, but I am running into some unexpected problems. It is unclear to me whether the behaviour I am observing is the desired behaviour of the library and why it would be.

我制作了一个具有嵌入层和GRU循环层的循环模型，以说明该问题.我将mask_zero=0.0用于嵌入层而不是遮罩层，但是更改此设置似乎没有什么不同(在输出之前也未添加遮罩层):

I've made a recurrent model with an embeddings layer and a GRU recurrent layer that illustrates the problem. I used mask_zero=0.0 for the embeddings layer instead of a masking layer, but changing this doesn't seem to make a difference (nor does adding a masking layer before the output):

import numpy
from keras.layers import Embedding, GRU, TimeDistributed, Dense, Input
from keras.models import Model
import keras.preprocessing.sequence

numpy.random.seed(0)
input_layer = Input(shape=(3,), dtype='int32', name='input')
embeddings = Embedding(input_dim=20, output_dim=2, input_length=3, mask_zero=True, name='embeddings')(input_layer)
recurrent = GRU(5, return_sequences=True, name='GRU')(embeddings)
output_layer = TimeDistributed(Dense(1), name='output')(recurrent)
model = Model(input=input_layer, output=output_layer)
output_weights = model.layers[-1].get_weights()
output_weights[1] = numpy.array([0.2])
model.layers[-1].set_weights(output_weights)
model.compile(loss='mse', metrics=['mse'], optimizer='adam', sample_weight_mode='temporal')

我使用masking和sample_weight参数将填充值排除在训练/评估之外.我将在一个使用Keras填充功能填充的输入/输出序列上测试该模型:

I use masking and the sample_weight parameter to exclude the padding values from the training/evaluation. I will test this model on one input/output sequence which I pad using the Keras padding function:

X = [[1, 2]] 
X_padded = keras.preprocessing.sequence.pad_sequences(X, dtype='float32', maxlen=3) 
Y = [[[1], [2]]] 
Y_padded = keras.preprocessing.sequence.pad_sequences(Y, maxlen=3, dtype='float32')

输出形状

为什么期望以这种方式格式化输出.为什么不能使用维度完全相同的输入/输出序列? model.evaluate(X_padded, Y_padded)给我一个尺寸错误.

Output Shape

Why the output is expected to be formatted in this way. Why can I not use input/output sequences that have exactly the same dimensionality? model.evaluate(X_padded, Y_padded) gives me a dimensionality error.

然后，当我运行model.predict(X_padded)时，得到以下输出(在生成模型之前使用numpy.random.seed(0)):

Then, when I run model.predict(X_padded) I get the following output (with numpy.random.seed(0) before generating the model):

[[[ 0.2       ]
  [ 0.19946882]
  [ 0.19175649]]]

为什么第一个输入没有被输出层掩盖?是否计算出了output_value(是否等于偏移量，因为隐藏层的值为0?这似乎不是所希望的.在输出层之前添加Masking层不能解决此问题.

Why isn't the first input masked for the output layer? Is the output_value computed anyways (and equal to the bias, as the hidden layer values are 0? This does not seem desirable. Adding a Masking layer before the output layer does not solve this problem.

然后，当我评估模型(model.evaluate(X_padded, Y_padded))时，这将返回整个序列(1.3168)的均方误差(MSE).包括我认为这是第一个值不会被遮盖，但不是我想要的东西.

Then, when I evaluate the model (model.evaluate(X_padded, Y_padded)), this returns the Mean Squared Error (MSE) of the entire sequence (1.3168) including this first value, which I suppose is to be expected when it isn't masked, but not what I would want.

根据Keras文档，我知道我应该使用sample_weight参数来解决此问题，我尝试过:

From the Keras documentation I understand I should use the sample_weight parameter to solve this problem, which I tried:

sample_weight = numpy.array([[0, 1, 1]])
model_evaluation = model.evaluate(X_padded, Y_padded, sample_weight=sample_weight)
print model.metrics_names, model_evaluation

我得到的输出是

['loss', 'mean_squared_error'] [2.9329459667205811, 1.3168648481369019]

这使指标(MSE)保持不变，仍然是所有所有值的MSE，包括我想要屏蔽的值.为什么?当我评估模型时，这不是我想要的.的确会导致丢失值发生变化，这似乎是最后两个值的MSE(归一化为对较长序列不赋予更多权重).

This leaves the metric (MSE) unaltered, it is still the MSE over all values, including the one that I wanted masked. Why? This is not what I want when I evaluate my model. It does cause a change in the loss value, which appears to be the MSE over the last two values normalised to not give more weight to longer sequences.

我对样品砝码做错了吗?另外，我真的不知道这个损失值是如何产生的.我应该怎么做才能从训练和评估中排除填充的值(我假设sample_weight参数在fit函数中的作用相同).

Am I doing something wrong with the sample weights? Also, I can really not figure out how this loss value came about. What should I do to exclude the padded values from both training and evaluation (I assume the sample_weight parameter works the same in the fit function).

可变行为序列的奇怪行为序列到序列学习 [英] Strange behaviour sequence to sequence learning for variable length sequences

问题描述

输出形状

Output Shape

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

可变行为序列的奇怪行为序列到序列学习 [英] Strange behaviour sequence to sequence learning for variable length sequences

问题描述

输出形状

Output Shape

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭