如何使用 TensorFlow 后端屏蔽 Keras 中的损失函数? [英] How do I mask a loss function in Keras with the TensorFlow backend?

查看:34
本文介绍了如何使用 TensorFlow 后端屏蔽 Keras 中的损失函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 Keras 的 LSTM 和 TensorFlow 后端实现一个序列到序列的任务.输入是长度可变的英语句子.为了构建一个二维形状的数据集 [batch_number, max_sentence_length],我在行尾添加了 EOF 并用足够的占位符填充每个句子,例如<代码>#.然后将句子中的每个字符转换为一个one-hot向量,使数据集具有3-D形状[batch_number, max_sentence_length, character_number].在 LSTM 编码器和解码器层之后,计算输出和目标之间的 softmax 交叉熵.

为了消除模型训练中的填充效应,可以在输入和损失函数上使用掩码.Keras 中的掩码输入可以通过使用 layers.core.Masking 来完成.在 TensorFlow 中,可以通过以下方式对损失函数进行掩蔽:

如上所示,在某些类型的图层后禁用遮罩.那么当添加这些层时,如何在 Keras 中屏蔽损失函数呢?

解决方案

如果您的模型中有蒙版,它将逐层传播并最终应用于损失.因此,如果您以正确的方式填充和屏蔽序列,则填充占位符的损失将被忽略.

一些细节:

解释整个过程有点复杂,所以我将其分解为几个步骤:

  1. compile()中,通过调用compute_mask()收集掩码并将其应用于损失(为清晰起见,忽略不相关的行).

weighted_losses = [_weighted_masked_objective(fn) for fn in loss_functions]# 准备输出掩码.面具 = self.compute_mask(self.inputs, mask=None)如果掩码为无:掩码 = [self.outputs 中的 _ 无]如果不是 isinstance(掩码,列表):面具 = [面具]# 计算总损失.total_loss = 无与 K.name_scope('loss'):对于我在范围内(len(self.outputs)):y_true = self.targets[i]y_pred = self.outputs[i]weighted_loss = weighted_losses[i]样本权重 = 样本权重 [i]面具 = 面具 [i]与 K.name_scope(self.output_names[i] + '_loss'):output_loss = weighted_loss(y_true, y_pred,sample_weight,掩码)

  1. Model.compute_mask()内部,run_internal_graph()被调用.
  2. run_internal_graph()内部,模型中的掩码通过为每一层调用Layer.compute_mask()从模型的输入逐层传播到输出迭代.

因此,如果您在模型中使用 Masking 层,则不必担心填充占位符的丢失.正如您可能已经在 _weighted_masked_objective() 中看到的那样,这些条目的损失将被掩盖.

一个小例子:

max_sentence_length = 5character_number = 2input_tensor = Input(shape=(max_sentence_length, character_number))masked_input = Masking(mask_value=0)(input_tensor)输出 = LSTM(3, return_sequences=True)(masked_input)模型 = 模型(输入张量,输出)model.compile(loss='mae', 优化器='adam')X = np.array([[[0, 0], [0, 0], [1, 0], [0, 1], [0, 1]],[[0, 0], [0, 1], [1, 0], [0, 1], [0, 1]]])y_true = np.ones((2, max_sentence_length, 3))y_pred = model.predict(X)打印(y_pred)[[[0.0.0.][ 0. 0. 0. ][-0.11980877 0.05803877 0.07880752][-0.00429189 0.13382857 0.19167568][0.06817091 0.19093043 0.26219055]][[ 0. 0. 0. ][ 0.0651961 0.10283815 0.12413475][-0.04420842 0.137494 0.13727818][ 0.04479844 0.17440712 0.24715884][0.11117355 0.21645413 0.30220413]]]# 查看model.evaluate() 计算的损失是否等于掩码损失unmasked_loss = np.abs(1 - y_pred).mean()masked_loss = np.abs(1 - y_pred[y_pred != 0]).mean()打印(模型.评估(X,y_true))0.881977558136打印(masked_loss)0.881978打印(unmasked_loss)0.917384

从这个例子可以看出,屏蔽部分(y_pred中的零)的损失被忽略了,model.evaluate()的输出是等于 masked_loss.


如果存在具有 return_sequences=False 的循环层,则掩码停止传播(即,返回的掩码为 None).在 RNN.compute_mask() 中:

def compute_mask(self, input, mask):如果是实例(掩码,列表):面具 = 面具 [0]output_mask = mask if self.return_sequences else None如果 self.return_state:state_mask = [在 self.states 中无 _]返回 [output_mask] + state_mask别的:返回输出掩码

就您而言,如果我理解正确,您需要一个基于 y_true 的掩码,并且每当 y_true 的值为 [0, 0,1](#"的单热编码)您希望损失被屏蔽.如果是这样,您需要以类似于 Daniel 的回答的方式掩盖损失值.

主要区别在于最终平均值.平均值应取未屏蔽值的数量,即 K.sum(mask).而且,y_true 可以直接与单热编码向量 [0, 0, 1] 进行比较.

def get_loss(mask_value):mask_value = K.variable(mask_value)def masked_categorical_crossentropy(y_true, y_pred):#找出`y_true`中的哪些时间步不是填充字符'#'掩码 = K.all(K.equal(y_true, mask_value), axis=-1)掩码 = 1 - K.cast(掩码,K.floatx())# 将 categorical_crossentropy 与掩码相乘损失 = K.categorical_crossentropy(y_true, y_pred) * 掩码# 取平均值 w.r.t.未屏蔽条目的数量返回 K.sum(loss)/K.sum(mask)返回 masked_categorical_crossentropymasked_categorical_crossentropy = get_loss(np.array([0, 0, 1]))模型 = 模型(输入张量,输出)model.compile(loss=masked_categorical_crossentropy, 优化器='adam')

上面代码的输出然后显示损失仅在未屏蔽的值上计算:

model.evaluate: 1.08339476585tf unmasked_loss: 1.08989tf masked_loss: 1.08339

该值与您的不同,因为我已将 tf.reverse 中的 axis 参数从 [0,1] 更改为 <代码>[1].

I am trying to implement a sequence-to-sequence task using LSTM by Keras with the TensorFlow backend. The inputs are English sentences with variable lengths. To construct a dataset with 2-D shape [batch_number, max_sentence_length], I add EOF at the end of the line and pad each sentence with enough placeholders, e.g. #. And then each character in the sentence is transformed into a one-hot vector, so that the dataset has 3-D shape [batch_number, max_sentence_length, character_number]. After LSTM encoder and decoder layers, softmax cross-entropy between output and target is computed.

To eliminate the padding effect in model training, masking could be used on input and loss function. Mask input in Keras can be done by using layers.core.Masking. In TensorFlow, masking on loss function can be done as follows: custom masked loss function in TensorFlow.

However, I don't find a way to realize it in Keras, since a user-defined loss function in Keras only accepts parameters y_true and y_pred. So how to input true sequence_lengths to loss function and mask?

Besides, I find a function _weighted_masked_objective(fn) in kerasengine raining.py. Its definition is

Adds support for masking and sample-weighting to an objective function.

But it seems that the function can only accept fn(y_true, y_pred). Is there a way to use this function to solve my problem?

To be specific, I modify the example of Yu-Yang.

from keras.models import Model
from keras.layers import Input, Masking, LSTM, Dense, RepeatVector, TimeDistributed, Activation
import numpy as np
from numpy.random import seed as random_seed
random_seed(123)

max_sentence_length = 5
character_number = 3 # valid character 'a, b' and placeholder '#'

input_tensor = Input(shape=(max_sentence_length, character_number))
masked_input = Masking(mask_value=0)(input_tensor)
encoder_output = LSTM(10, return_sequences=False)(masked_input)
repeat_output = RepeatVector(max_sentence_length)(encoder_output)
decoder_output = LSTM(10, return_sequences=True)(repeat_output)
output = Dense(3, activation='softmax')(decoder_output)

model = Model(input_tensor, output)
model.compile(loss='categorical_crossentropy', optimizer='adam')
model.summary()

X = np.array([[[0, 0, 0], [0, 0, 0], [1, 0, 0], [0, 1, 0], [0, 1, 0]],
          [[0, 0, 0], [0, 1, 0], [1, 0, 0], [0, 1, 0], [0, 1, 0]]])
y_true = np.array([[[0, 0, 1], [0, 0, 1], [1, 0, 0], [0, 1, 0], [0, 1, 0]], # the batch is ['##abb','#babb'], padding '#'
          [[0, 0, 1], [0, 1, 0], [1, 0, 0], [0, 1, 0], [0, 1, 0]]])

y_pred = model.predict(X)
print('y_pred:', y_pred)
print('y_true:', y_true)
print('model.evaluate:', model.evaluate(X, y_true))
# See if the loss computed by model.evaluate() is equal to the masked loss
import tensorflow as tf
logits=tf.constant(y_pred, dtype=tf.float32)
target=tf.constant(y_true, dtype=tf.float32)
cross_entropy = tf.reduce_mean(-tf.reduce_sum(target * tf.log(logits),axis=2))
losses = -tf.reduce_sum(target * tf.log(logits),axis=2)
sequence_lengths=tf.constant([3,4])
mask = tf.reverse(tf.sequence_mask(sequence_lengths,maxlen=max_sentence_length),[0,1])
losses = tf.boolean_mask(losses, mask)
masked_loss = tf.reduce_mean(losses)
with tf.Session() as sess:
    c_e = sess.run(cross_entropy)
    m_c_e=sess.run(masked_loss)
    print("tf unmasked_loss:", c_e)
    print("tf masked_loss:", m_c_e)

The output in Keras and TensorFlow are compared as follows:

As shown above, masking is disabled after some kinds of layers. So how to mask the loss function in Keras when those layers are added?

解决方案

If there's a mask in your model, it'll be propagated layer-by-layer and eventually applied to the loss. So if you're padding and masking the sequences in a correct way, the loss on the padding placeholders would be ignored.

Some Details:

It's a bit involved to explain the whole process, so I'll just break it down to several steps:

  1. In compile(), the mask is collected by calling compute_mask() and applied to the loss(es) (irrelevant lines are ignored for clarity).

weighted_losses = [_weighted_masked_objective(fn) for fn in loss_functions]

# Prepare output masks.
masks = self.compute_mask(self.inputs, mask=None)
if masks is None:
    masks = [None for _ in self.outputs]
if not isinstance(masks, list):
    masks = [masks]

# Compute total loss.
total_loss = None
with K.name_scope('loss'):
    for i in range(len(self.outputs)):
        y_true = self.targets[i]
        y_pred = self.outputs[i]
        weighted_loss = weighted_losses[i]
        sample_weight = sample_weights[i]
        mask = masks[i]
        with K.name_scope(self.output_names[i] + '_loss'):
            output_loss = weighted_loss(y_true, y_pred,
                                        sample_weight, mask)

  1. Inside Model.compute_mask(), run_internal_graph() is called.
  2. Inside run_internal_graph(), the masks in the model is propagated layer-by-layer from the model's inputs to outputs by calling Layer.compute_mask() for each layer iteratively.

So if you're using a Masking layer in your model, you shouldn't worry about the loss on the padding placeholders. The loss on those entries will be masked out as you've probably already seen inside _weighted_masked_objective().

A Small Example:

max_sentence_length = 5
character_number = 2

input_tensor = Input(shape=(max_sentence_length, character_number))
masked_input = Masking(mask_value=0)(input_tensor)
output = LSTM(3, return_sequences=True)(masked_input)
model = Model(input_tensor, output)
model.compile(loss='mae', optimizer='adam')

X = np.array([[[0, 0], [0, 0], [1, 0], [0, 1], [0, 1]],
              [[0, 0], [0, 1], [1, 0], [0, 1], [0, 1]]])
y_true = np.ones((2, max_sentence_length, 3))
y_pred = model.predict(X)
print(y_pred)
[[[ 0.          0.          0.        ]
  [ 0.          0.          0.        ]
  [-0.11980877  0.05803877  0.07880752]
  [-0.00429189  0.13382857  0.19167568]
  [ 0.06817091  0.19093043  0.26219055]]

 [[ 0.          0.          0.        ]
  [ 0.0651961   0.10283815  0.12413475]
  [-0.04420842  0.137494    0.13727818]
  [ 0.04479844  0.17440712  0.24715884]
  [ 0.11117355  0.21645413  0.30220413]]]

# See if the loss computed by model.evaluate() is equal to the masked loss
unmasked_loss = np.abs(1 - y_pred).mean()
masked_loss = np.abs(1 - y_pred[y_pred != 0]).mean()

print(model.evaluate(X, y_true))
0.881977558136

print(masked_loss)
0.881978

print(unmasked_loss)
0.917384

As can be seen from this example, the loss on the masked part (the zeroes in y_pred) is ignored, and the output of model.evaluate() is equal to masked_loss.


EDIT:

If there's a recurrent layer with return_sequences=False, the mask stop propagates (i.e., the returned mask is None). In RNN.compute_mask():

def compute_mask(self, inputs, mask):
    if isinstance(mask, list):
        mask = mask[0]
    output_mask = mask if self.return_sequences else None
    if self.return_state:
        state_mask = [None for _ in self.states]
        return [output_mask] + state_mask
    else:
        return output_mask

In your case, if I understand correctly, you want a mask that's based on y_true, and whenever the value of y_true is [0, 0, 1] (the one-hot encoding of "#") you want the loss to be masked. If so, you need to mask the loss values in a somewhat similar way to Daniel's answer.

The main difference is the final average. The average should be taken over the number of unmasked values, which is just K.sum(mask). And also, y_true can be compared to the one-hot encoded vector [0, 0, 1] directly.

def get_loss(mask_value):
    mask_value = K.variable(mask_value)
    def masked_categorical_crossentropy(y_true, y_pred):
        # find out which timesteps in `y_true` are not the padding character '#'
        mask = K.all(K.equal(y_true, mask_value), axis=-1)
        mask = 1 - K.cast(mask, K.floatx())

        # multiply categorical_crossentropy with the mask
        loss = K.categorical_crossentropy(y_true, y_pred) * mask

        # take average w.r.t. the number of unmasked entries
        return K.sum(loss) / K.sum(mask)
    return masked_categorical_crossentropy

masked_categorical_crossentropy = get_loss(np.array([0, 0, 1]))
model = Model(input_tensor, output)
model.compile(loss=masked_categorical_crossentropy, optimizer='adam')

The output of the above code then shows that the loss is computed only on the unmasked values:

model.evaluate: 1.08339476585
tf unmasked_loss: 1.08989
tf masked_loss: 1.08339

The value is different from yours because I've changed the axis argument in tf.reverse from [0,1] to [1].

这篇关于如何使用 TensorFlow 后端屏蔽 Keras 中的损失函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆