如何使用TensorFlow后端掩盖Keras中的损失函数? [英] How do I mask a loss function in Keras with the TensorFlow backend?
问题描述
我正在尝试使用Keras的LSTM和TensorFlow后端实现序列到序列的任务.输入是长度可变的英语句子.要构建二维形状为[batch_number, max_sentence_length]
的数据集,我在行的末尾添加EOF
并为每个句子填充足够的占位符,例如#
.然后将句子中的每个字符转换为一个热门向量,以使数据集具有3-D形状[batch_number, max_sentence_length, character_number]
.在LSTM编码器和解码器层之后,计算输出和目标之间的softmax交叉熵.
I am trying to implement a sequence-to-sequence task using LSTM by Keras with the TensorFlow backend. The inputs are English sentences with variable lengths. To construct a dataset with 2-D shape [batch_number, max_sentence_length]
, I add EOF
at the end of the line and pad each sentence with enough placeholders, e.g. #
. And then each character in the sentence is transformed into a one-hot vector, so that the dataset has 3-D shape [batch_number, max_sentence_length, character_number]
. After LSTM encoder and decoder layers, softmax cross-entropy between output and target is computed.
要消除模型训练中的填充效果,可以在输入和损失函数上使用遮罩.可以使用layers.core.Masking
在Keras中输入遮罩.在TensorFlow中,可以按以下步骤进行损失函数掩蔽: TensorFlow中的自定义掩蔽损失函数.
To eliminate the padding effect in model training, masking could be used on input and loss function. Mask input in Keras can be done by using layers.core.Masking
. In TensorFlow, masking on loss function can be done as follows: custom masked loss function in TensorFlow.
但是,我找不到在Keras中实现它的方法,因为Keras中的用户定义的损失函数仅接受参数y_true
和y_pred
.那么如何将真实的sequence_lengths
输入到损失函数和掩码?
However, I don't find a way to realize it in Keras, since a user-defined loss function in Keras only accepts parameters y_true
and y_pred
. So how to input true sequence_lengths
to loss function and mask?
此外,我在\keras\engine\training.py
中找到了函数_weighted_masked_objective(fn)
.它的定义是
Besides, I find a function _weighted_masked_objective(fn)
in \keras\engine\training.py
. Its definition is
为目标函数添加了对掩盖和样本加权的支持.
Adds support for masking and sample-weighting to an objective function.
但是该函数似乎只能接受fn(y_true, y_pred)
.有没有办法使用此功能解决我的问题?
But it seems that the function can only accept fn(y_true, y_pred)
. Is there a way to use this function to solve my problem?
具体来说,我修改了Yu-Yang的示例.
To be specific, I modify the example of Yu-Yang.
from keras.models import Model
from keras.layers import Input, Masking, LSTM, Dense, RepeatVector, TimeDistributed, Activation
import numpy as np
from numpy.random import seed as random_seed
random_seed(123)
max_sentence_length = 5
character_number = 3 # valid character 'a, b' and placeholder '#'
input_tensor = Input(shape=(max_sentence_length, character_number))
masked_input = Masking(mask_value=0)(input_tensor)
encoder_output = LSTM(10, return_sequences=False)(masked_input)
repeat_output = RepeatVector(max_sentence_length)(encoder_output)
decoder_output = LSTM(10, return_sequences=True)(repeat_output)
output = Dense(3, activation='softmax')(decoder_output)
model = Model(input_tensor, output)
model.compile(loss='categorical_crossentropy', optimizer='adam')
model.summary()
X = np.array([[[0, 0, 0], [0, 0, 0], [1, 0, 0], [0, 1, 0], [0, 1, 0]],
[[0, 0, 0], [0, 1, 0], [1, 0, 0], [0, 1, 0], [0, 1, 0]]])
y_true = np.array([[[0, 0, 1], [0, 0, 1], [1, 0, 0], [0, 1, 0], [0, 1, 0]], # the batch is ['##abb','#babb'], padding '#'
[[0, 0, 1], [0, 1, 0], [1, 0, 0], [0, 1, 0], [0, 1, 0]]])
y_pred = model.predict(X)
print('y_pred:', y_pred)
print('y_true:', y_true)
print('model.evaluate:', model.evaluate(X, y_true))
# See if the loss computed by model.evaluate() is equal to the masked loss
import tensorflow as tf
logits=tf.constant(y_pred, dtype=tf.float32)
target=tf.constant(y_true, dtype=tf.float32)
cross_entropy = tf.reduce_mean(-tf.reduce_sum(target * tf.log(logits),axis=2))
losses = -tf.reduce_sum(target * tf.log(logits),axis=2)
sequence_lengths=tf.constant([3,4])
mask = tf.reverse(tf.sequence_mask(sequence_lengths,maxlen=max_sentence_length),[0,1])
losses = tf.boolean_mask(losses, mask)
masked_loss = tf.reduce_mean(losses)
with tf.Session() as sess:
c_e = sess.run(cross_entropy)
m_c_e=sess.run(masked_loss)
print("tf unmasked_loss:", c_e)
print("tf masked_loss:", m_c_e)
Keras和TensorFlow中的输出比较如下:
The output in Keras and TensorFlow are compared as follows:
如上所示,在某些类型的层之后,屏蔽被禁用.那么,当添加这些层时,如何掩盖Keras中的损失函数呢?
As shown above, masking is disabled after some kinds of layers. So how to mask the loss function in Keras when those layers are added?
推荐答案
如果模型中有遮罩,则将逐层传播该遮罩,并最终将其应用于损失.因此,如果您以正确的方式填充和屏蔽序列,则填充占位符上的损失将被忽略.
If there's a mask in your model, it'll be propagated layer-by-layer and eventually applied to the loss. So if you're padding and masking the sequences in a correct way, the loss on the padding placeholders would be ignored.
涉及到整个过程的解释,所以我将其分解为几个步骤:
It's a bit involved to explain the whole process, so I'll just break it down to several steps:
- 在
compile()
中,通过调用compute_mask()
收集掩码并将其应用于损失(为清晰起见,忽略了不相关的行).
- In
compile()
, the mask is collected by callingcompute_mask()
and applied to the loss(es) (irrelevant lines are ignored for clarity).
weighted_losses = [_weighted_masked_objective(fn) for fn in loss_functions]
# Prepare output masks.
masks = self.compute_mask(self.inputs, mask=None)
if masks is None:
masks = [None for _ in self.outputs]
if not isinstance(masks, list):
masks = [masks]
# Compute total loss.
total_loss = None
with K.name_scope('loss'):
for i in range(len(self.outputs)):
y_true = self.targets[i]
y_pred = self.outputs[i]
weighted_loss = weighted_losses[i]
sample_weight = sample_weights[i]
mask = masks[i]
with K.name_scope(self.output_names[i] + '_loss'):
output_loss = weighted_loss(y_true, y_pred,
sample_weight, mask)
- 在
Model.compute_mask()
内部,调用了run_internal_graph()
. - 在
run_internal_graph()
内部,通过迭代地为每个图层调用Layer.compute_mask()
,将模型中的掩码从模型的输入逐层传播到输出.
- Inside
Model.compute_mask()
,run_internal_graph()
is called. - Inside
run_internal_graph()
, the masks in the model is propagated layer-by-layer from the model's inputs to outputs by callingLayer.compute_mask()
for each layer iteratively.
因此,如果您在模型中使用Masking
层,则不必担心填充占位符的损失.正如您可能已经在_weighted_masked_objective()
内部看到的那样,这些条目的损失将被掩盖.
So if you're using a Masking
layer in your model, you shouldn't worry about the loss on the padding placeholders. The loss on those entries will be masked out as you've probably already seen inside _weighted_masked_objective()
.
max_sentence_length = 5
character_number = 2
input_tensor = Input(shape=(max_sentence_length, character_number))
masked_input = Masking(mask_value=0)(input_tensor)
output = LSTM(3, return_sequences=True)(masked_input)
model = Model(input_tensor, output)
model.compile(loss='mae', optimizer='adam')
X = np.array([[[0, 0], [0, 0], [1, 0], [0, 1], [0, 1]],
[[0, 0], [0, 1], [1, 0], [0, 1], [0, 1]]])
y_true = np.ones((2, max_sentence_length, 3))
y_pred = model.predict(X)
print(y_pred)
[[[ 0. 0. 0. ]
[ 0. 0. 0. ]
[-0.11980877 0.05803877 0.07880752]
[-0.00429189 0.13382857 0.19167568]
[ 0.06817091 0.19093043 0.26219055]]
[[ 0. 0. 0. ]
[ 0.0651961 0.10283815 0.12413475]
[-0.04420842 0.137494 0.13727818]
[ 0.04479844 0.17440712 0.24715884]
[ 0.11117355 0.21645413 0.30220413]]]
# See if the loss computed by model.evaluate() is equal to the masked loss
unmasked_loss = np.abs(1 - y_pred).mean()
masked_loss = np.abs(1 - y_pred[y_pred != 0]).mean()
print(model.evaluate(X, y_true))
0.881977558136
print(masked_loss)
0.881978
print(unmasked_loss)
0.917384
从该示例可以看出,被遮罩部分(y_pred
中的零)的损耗被忽略,并且model.evaluate()
的输出等于masked_loss
.
As can be seen from this example, the loss on the masked part (the zeroes in y_pred
) is ignored, and the output of model.evaluate()
is equal to masked_loss
.
如果存在带有return_sequences=False
的循环图层,则掩码停止传播(即,返回的掩码为None
).在RNN.compute_mask()
:
If there's a recurrent layer with return_sequences=False
, the mask stop propagates (i.e., the returned mask is None
). In RNN.compute_mask()
:
def compute_mask(self, inputs, mask):
if isinstance(mask, list):
mask = mask[0]
output_mask = mask if self.return_sequences else None
if self.return_state:
state_mask = [None for _ in self.states]
return [output_mask] + state_mask
else:
return output_mask
在您的情况下,如果我理解正确,则需要基于y_true
的蒙版,并且每当y_true
的值是[0, 0, 1]
(#"的一次性编码)时,您都希望丢失被掩盖.如果是这样,则您需要以与丹尼尔的答案有点类似的方式掩盖损耗值.
In your case, if I understand correctly, you want a mask that's based on y_true
, and whenever the value of y_true
is [0, 0, 1]
(the one-hot encoding of "#") you want the loss to be masked. If so, you need to mask the loss values in a somewhat similar way to Daniel's answer.
主要差异是最终平均值.平均值应超过未屏蔽值的数量,该数量仅为K.sum(mask)
.而且,y_true
可以直接与单热编码矢量[0, 0, 1]
进行比较.
The main difference is the final average. The average should be taken over the number of unmasked values, which is just K.sum(mask)
. And also, y_true
can be compared to the one-hot encoded vector [0, 0, 1]
directly.
def get_loss(mask_value):
mask_value = K.variable(mask_value)
def masked_categorical_crossentropy(y_true, y_pred):
# find out which timesteps in `y_true` are not the padding character '#'
mask = K.all(K.equal(y_true, mask_value), axis=-1)
mask = 1 - K.cast(mask, K.floatx())
# multiply categorical_crossentropy with the mask
loss = K.categorical_crossentropy(y_true, y_pred) * mask
# take average w.r.t. the number of unmasked entries
return K.sum(loss) / K.sum(mask)
return masked_categorical_crossentropy
masked_categorical_crossentropy = get_loss(np.array([0, 0, 1]))
model = Model(input_tensor, output)
model.compile(loss=masked_categorical_crossentropy, optimizer='adam')
然后,以上代码的输出显示仅在未屏蔽的值上计算损失:
The output of the above code then shows that the loss is computed only on the unmasked values:
model.evaluate: 1.08339476585
tf unmasked_loss: 1.08989
tf masked_loss: 1.08339
该值与您的值不同,因为我已将tf.reverse
中的axis
参数从[0,1]
更改为[1]
.
The value is different from yours because I've changed the axis
argument in tf.reverse
from [0,1]
to [1]
.
这篇关于如何使用TensorFlow后端掩盖Keras中的损失函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!