如何使用TensorFlow后端掩盖Keras中的损失函数? [英] How do I mask a loss function in Keras with the TensorFlow backend?

查看:223
本文介绍了如何使用TensorFlow后端掩盖Keras中的损失函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Keras的LSTM和TensorFlow后端实现序列到序列的任务.输入是长度可变的英语句子.要构建二维形状为[batch_number, max_sentence_length]的数据集,我在行的末尾添加EOF并为每个句子填充足够的占位符,例如#.然后将句子中的每个字符转换为一个热门向量,以使数据集具有3-D形状[batch_number, max_sentence_length, character_number].在LSTM编码器和解码器层之后,计算输出和目标之间的softmax交叉熵.

I am trying to implement a sequence-to-sequence task using LSTM by Keras with the TensorFlow backend. The inputs are English sentences with variable lengths. To construct a dataset with 2-D shape [batch_number, max_sentence_length], I add EOF at the end of the line and pad each sentence with enough placeholders, e.g. #. And then each character in the sentence is transformed into a one-hot vector, so that the dataset has 3-D shape [batch_number, max_sentence_length, character_number]. After LSTM encoder and decoder layers, softmax cross-entropy between output and target is computed.

要消除模型训练中的填充效果,可以在输入和损失函数上使用遮罩.可以使用layers.core.Masking在Keras中输入遮罩.在TensorFlow中,可以按以下步骤进行损失函数掩蔽: TensorFlow中的自定义掩蔽损失函数.

To eliminate the padding effect in model training, masking could be used on input and loss function. Mask input in Keras can be done by using layers.core.Masking. In TensorFlow, masking on loss function can be done as follows: custom masked loss function in TensorFlow.

但是,我找不到在Keras中实现它的方法,因为Keras中的用户定义的损失函数仅接受参数y_truey_pred.那么如何将真实的sequence_lengths输入到损失函数和掩码?

However, I don't find a way to realize it in Keras, since a user-defined loss function in Keras only accepts parameters y_true and y_pred. So how to input true sequence_lengths to loss function and mask?

此外,我在\keras\engine\training.py中找到了函数_weighted_masked_objective(fn).它的定义是

Besides, I find a function _weighted_masked_objective(fn) in \keras\engine\training.py. Its definition is

为目标函数添加了对掩盖和样本加权的支持.

Adds support for masking and sample-weighting to an objective function.

但是该函数似乎只能接受fn(y_true, y_pred).有没有办法使用此功能解决我的问题?

But it seems that the function can only accept fn(y_true, y_pred). Is there a way to use this function to solve my problem?

具体来说,我修改了Yu-Yang的示例.

To be specific, I modify the example of Yu-Yang.

from keras.models import Model
from keras.layers import Input, Masking, LSTM, Dense, RepeatVector, TimeDistributed, Activation
import numpy as np
from numpy.random import seed as random_seed
random_seed(123)

max_sentence_length = 5
character_number = 3 # valid character 'a, b' and placeholder '#'

input_tensor = Input(shape=(max_sentence_length, character_number))
masked_input = Masking(mask_value=0)(input_tensor)
encoder_output = LSTM(10, return_sequences=False)(masked_input)
repeat_output = RepeatVector(max_sentence_length)(encoder_output)
decoder_output = LSTM(10, return_sequences=True)(repeat_output)
output = Dense(3, activation='softmax')(decoder_output)

model = Model(input_tensor, output)
model.compile(loss='categorical_crossentropy', optimizer='adam')
model.summary()

X = np.array([[[0, 0, 0], [0, 0, 0], [1, 0, 0], [0, 1, 0], [0, 1, 0]],
          [[0, 0, 0], [0, 1, 0], [1, 0, 0], [0, 1, 0], [0, 1, 0]]])
y_true = np.array([[[0, 0, 1], [0, 0, 1], [1, 0, 0], [0, 1, 0], [0, 1, 0]], # the batch is ['##abb','#babb'], padding '#'
          [[0, 0, 1], [0, 1, 0], [1, 0, 0], [0, 1, 0], [0, 1, 0]]])

y_pred = model.predict(X)
print('y_pred:', y_pred)
print('y_true:', y_true)
print('model.evaluate:', model.evaluate(X, y_true))
# See if the loss computed by model.evaluate() is equal to the masked loss
import tensorflow as tf
logits=tf.constant(y_pred, dtype=tf.float32)
target=tf.constant(y_true, dtype=tf.float32)
cross_entropy = tf.reduce_mean(-tf.reduce_sum(target * tf.log(logits),axis=2))
losses = -tf.reduce_sum(target * tf.log(logits),axis=2)
sequence_lengths=tf.constant([3,4])
mask = tf.reverse(tf.sequence_mask(sequence_lengths,maxlen=max_sentence_length),[0,1])
losses = tf.boolean_mask(losses, mask)
masked_loss = tf.reduce_mean(losses)
with tf.Session() as sess:
    c_e = sess.run(cross_entropy)
    m_c_e=sess.run(masked_loss)
    print("tf unmasked_loss:", c_e)
    print("tf masked_loss:", m_c_e)

Keras和TensorFlow中的输出比较如下:

The output in Keras and TensorFlow are compared as follows:

如上所示,在某些类型的层之后,屏蔽被禁用.那么,当添加这些层时,如何掩盖Keras中的损失函数呢?

As shown above, masking is disabled after some kinds of layers. So how to mask the loss function in Keras when those layers are added?

推荐答案

如果模型中有遮罩,则将逐层传播该遮罩,并最终将其应用于损失.因此,如果您以正确的方式填充和屏蔽序列,则填充占位符上的损失将被忽略.

If there's a mask in your model, it'll be propagated layer-by-layer and eventually applied to the loss. So if you're padding and masking the sequences in a correct way, the loss on the padding placeholders would be ignored.

涉及到整个过程的解释,所以我将其分解为几个步骤:

It's a bit involved to explain the whole process, so I'll just break it down to several steps:

  1. compile()中,通过调用compute_mask()收集掩码并将其应用于损失(为清晰起见,忽略了不相关的行).
  1. In compile(), the mask is collected by calling compute_mask() and applied to the loss(es) (irrelevant lines are ignored for clarity).

weighted_losses = [_weighted_masked_objective(fn) for fn in loss_functions]

# Prepare output masks.
masks = self.compute_mask(self.inputs, mask=None)
if masks is None:
    masks = [None for _ in self.outputs]
if not isinstance(masks, list):
    masks = [masks]

# Compute total loss.
total_loss = None
with K.name_scope('loss'):
    for i in range(len(self.outputs)):
        y_true = self.targets[i]
        y_pred = self.outputs[i]
        weighted_loss = weighted_losses[i]
        sample_weight = sample_weights[i]
        mask = masks[i]
        with K.name_scope(self.output_names[i] + '_loss'):
            output_loss = weighted_loss(y_true, y_pred,
                                        sample_weight, mask)

  1. Model.compute_mask()内部,调用了run_internal_graph().
  2. run_internal_graph()内部,通过迭代地为每个图层调用Layer.compute_mask(),将模型中的掩码从模型的输入逐层传播到输出.
  1. Inside Model.compute_mask(), run_internal_graph() is called.
  2. Inside run_internal_graph(), the masks in the model is propagated layer-by-layer from the model's inputs to outputs by calling Layer.compute_mask() for each layer iteratively.

因此,如果您在模型中使用Masking层,则不必担心填充占位符的损失.正如您可能已经在_weighted_masked_objective()内部看到的那样,这些条目的损失将被掩盖.

So if you're using a Masking layer in your model, you shouldn't worry about the loss on the padding placeholders. The loss on those entries will be masked out as you've probably already seen inside _weighted_masked_objective().

max_sentence_length = 5
character_number = 2

input_tensor = Input(shape=(max_sentence_length, character_number))
masked_input = Masking(mask_value=0)(input_tensor)
output = LSTM(3, return_sequences=True)(masked_input)
model = Model(input_tensor, output)
model.compile(loss='mae', optimizer='adam')

X = np.array([[[0, 0], [0, 0], [1, 0], [0, 1], [0, 1]],
              [[0, 0], [0, 1], [1, 0], [0, 1], [0, 1]]])
y_true = np.ones((2, max_sentence_length, 3))
y_pred = model.predict(X)
print(y_pred)
[[[ 0.          0.          0.        ]
  [ 0.          0.          0.        ]
  [-0.11980877  0.05803877  0.07880752]
  [-0.00429189  0.13382857  0.19167568]
  [ 0.06817091  0.19093043  0.26219055]]

 [[ 0.          0.          0.        ]
  [ 0.0651961   0.10283815  0.12413475]
  [-0.04420842  0.137494    0.13727818]
  [ 0.04479844  0.17440712  0.24715884]
  [ 0.11117355  0.21645413  0.30220413]]]

# See if the loss computed by model.evaluate() is equal to the masked loss
unmasked_loss = np.abs(1 - y_pred).mean()
masked_loss = np.abs(1 - y_pred[y_pred != 0]).mean()

print(model.evaluate(X, y_true))
0.881977558136

print(masked_loss)
0.881978

print(unmasked_loss)
0.917384

从该示例可以看出,被遮罩部分(y_pred中的零)的损耗被忽略,并且model.evaluate()的输出等于masked_loss.

As can be seen from this example, the loss on the masked part (the zeroes in y_pred) is ignored, and the output of model.evaluate() is equal to masked_loss.

如果存在带有return_sequences=False的循环图层,则掩码停止传播(即,返回的掩码为None).在RNN.compute_mask():

If there's a recurrent layer with return_sequences=False, the mask stop propagates (i.e., the returned mask is None). In RNN.compute_mask():

def compute_mask(self, inputs, mask):
    if isinstance(mask, list):
        mask = mask[0]
    output_mask = mask if self.return_sequences else None
    if self.return_state:
        state_mask = [None for _ in self.states]
        return [output_mask] + state_mask
    else:
        return output_mask

在您的情况下,如果我理解正确,则需要基于y_true的蒙版,并且每当y_true的值是[0, 0, 1](#"的一次性编码)时,您都希望丢失被掩盖.如果是这样,则您需要以与丹尼尔的答案有点类似的方式掩盖损耗值.

In your case, if I understand correctly, you want a mask that's based on y_true, and whenever the value of y_true is [0, 0, 1] (the one-hot encoding of "#") you want the loss to be masked. If so, you need to mask the loss values in a somewhat similar way to Daniel's answer.

主要差异是最终平均值.平均值应超过未屏蔽值的数量,该数量仅为K.sum(mask).而且,y_true可以直接与单热编码矢量[0, 0, 1]进行比较.

The main difference is the final average. The average should be taken over the number of unmasked values, which is just K.sum(mask). And also, y_true can be compared to the one-hot encoded vector [0, 0, 1] directly.

def get_loss(mask_value):
    mask_value = K.variable(mask_value)
    def masked_categorical_crossentropy(y_true, y_pred):
        # find out which timesteps in `y_true` are not the padding character '#'
        mask = K.all(K.equal(y_true, mask_value), axis=-1)
        mask = 1 - K.cast(mask, K.floatx())

        # multiply categorical_crossentropy with the mask
        loss = K.categorical_crossentropy(y_true, y_pred) * mask

        # take average w.r.t. the number of unmasked entries
        return K.sum(loss) / K.sum(mask)
    return masked_categorical_crossentropy

masked_categorical_crossentropy = get_loss(np.array([0, 0, 1]))
model = Model(input_tensor, output)
model.compile(loss=masked_categorical_crossentropy, optimizer='adam')

然后,以上代码的输出显示仅在未屏蔽的值上计算损失:

The output of the above code then shows that the loss is computed only on the unmasked values:

model.evaluate: 1.08339476585
tf unmasked_loss: 1.08989
tf masked_loss: 1.08339

该值与您的值不同,因为我已将tf.reverse中的axis参数从[0,1]更改为[1].

The value is different from yours because I've changed the axis argument in tf.reverse from [0,1] to [1].

这篇关于如何使用TensorFlow后端掩盖Keras中的损失函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆