如何在 Keras 模型中使用 TensorFlow 的采样 softmax 损失函数? [英] How can I use TensorFlow's sampled softmax loss function in a Keras model?

查看:31
本文介绍了如何在 Keras 模型中使用 TensorFlow 的采样 softmax 损失函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在 Keras 中训练语言模型,并希望通过使用采样的 softmax 作为我网络中的最终激活函数来加快训练速度.从 TF 文档来看,我似乎需要为 weightsbiases 提供参数,但我不确定这些参数的预期输入.似乎我可以在 Keras 中编写一个自定义函数,如下所示:

I'm training a language model in Keras and would like to speed up training by using sampled softmax as the final activation function in my network. From the TF docs, it looks like I need to supply arguments for weights and biases, but I'm unsure of what is expected as input for these. It seems like I could write a custom function in Keras as follows:

import keras.backend as K

def sampled_softmax(weights, biases, y_true, y_pred, num_sampled, num_classes):
    return K.sampled_softmax(weights, biases, y_true, y_pred, num_sampled, num_classes)

但是,我不确定如何将其插入"到我现有的网络中.LM 的架构非常简单:

However, I'm unsure of how to "plug this in" to my existing network. The architecture for the LM is pretty dead-simple:

model = Sequential()
model.add(Embedding(input_dim=len(vocab), output_dim=256))
model.add(LSTM(1024, return_sequence=True))
model.add(Dense(output_dim=len(vocab), activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')

鉴于这种架构,我可以在模型上调用 compile 方法时将 sampled_softmax 函数作为 loss 参数传递吗?或者这是否需要写为最后一个全连接层之后的层.这里的任何指导将不胜感激.谢谢.

Given this architecture, could I pass the sampled_softmax function as the loss argument when calling the compile method on the model? Or do this need to be written as a layer that comes after the final fully-connected layer. Any guidance here would be greatly appreciated. Thanks.

推荐答案

这里的关键观察是 TensorFlow 采样的 softmax 函数返回实际损失,而不是一组对可能标签集的预测以与地面实况数据进行比较然后作为一个单独的步骤计算损失.这使得模型设置有点奇怪.

The key observation here is that the TensorFlow sampled softmax function returns actual losses, not a set of predictions over the set of possible labels to compare with the ground truth data to then compute losses as a separate step. This makes the model setup a little bit weird.

首先,我们向模型添加第二个输入层,除了作为目标输出之外,该层第二次将目标(训练)数据编码为输入.这用于 sampled_softmax_loss 函数的 labels 参数.它必须是 Keras 输入,因为当我们去实例化和设置模型时,它被视为输入.

First, we add a second input layer to the model that encodes the target (training) data a second time as an input, in addition to being the target output. This is used for the labels argument of the sampled_softmax_loss function. It needs to be a Keras input, because it's treated as an input when we go to instantiate and set up the model.

其次,我们构建了一个新的自定义 Keras 层,该层调用 sampled_softmax_loss 函数,其中两个 Keras 层作为其输入:预测类别的密集层的输出,然后是包含训练数据的副本.请注意,我们正在访问 _keras_history 实例变量以从原始全连接层的输出张量中获取权重和偏置张量.

Second, we construct a new custom Keras layer that calls the sampled_softmax_loss function with two Keras layers as its inputs: the output of the dense layer that predicts our classes, and then the second input that contains a copy of the training data. Note that we're doing some serious hackery accessing the _keras_history instance variable to fetch the weight and bias tensors from the output tensor of the original fully-connected layer.

最后,我们必须构建一个新的哑"损失函数,该函数忽略训练数据并仅使用 sampled_softmax_loss 函数报告的损失.

Finally, we have to construct a new "dumb" loss function that ignores the training data and just uses the loss reported by the sampled_softmax_loss function.

请注意,由于采样的 softmax 函数返回的是损失,而不是类别预测,因此您不能使用此模型规范进行验证或推理.您需要在新规范中重新使用此训练版本"中的训练层,该规范将标准 softmax 函数应用于应用了默认激活函数的原始密集层.

Note that because the sampled softmax function returns losses, not class predictions, you can't use this model specification for validation or inference. You'll need to re-use the trained layers from this "training version" in a new specification that applies a standard softmax function to the original dense layer which has the default activation function applied.

肯定有一种更优雅的方法来做到这一点,但我相信这是有效的,所以我想我现在就按原样张贴在这里,而不是等到我有一些更整洁的东西.例如,您可能希望将类数作为 SampledSoftmax 层的参数,或者更好的是,将其全部压缩到原始问题中的损失函数中,并避免传入训练数据两次.

There is definitely a more elegant way to do this, but I believe this works, so I figured I'd post it here now as-is rather than wait until I have something that's a little bit neater. For example, you'd probably want to make the number of classes an argument of the SampledSoftmax layer, or better yet, condense this all into the loss function as in the original question and avoid passing in the training data twice.

from keras.models import Model
from keras.layers import Input, Dense, Layer
from keras import backend as K

class SampledSoftmax(Layer):
    def __init__(self, **kwargs):
        super(SampledSoftmax, self).__init__(**kwargs)


    def call(self, inputs):
        """
        The first input should be the model as it were, and the second the
        target (i.e., a repeat of the training data) to compute the labels
        argument

        """
        # the labels input to this function is batch size by 1, where the
        # value at position (i, 1) is the index that is true (not zero)
        # e.g., (0, 0, 1) => (2) or (0, 1, 0, 0) => (1)
        return K.tf.nn.sampled_softmax_loss(weights=inputs[0]._keras_history[0].weights[0],
                                            biases=inputs[0]._keras_history[0].bias,
                                            inputs=inputs[0],
                                            labels=K.tf.reshape(K.tf.argmax(inputs[1], 1), [-1, 1]),
                                            num_sampled=1000,
                                            num_classes=200000)

def custom_loss(y_true, y_pred):
    return K.tf.reduce_mean(y_pred)


num_classes = 200000
input = Input(shape=(300,))
target_input = Input(shape=(num_classes,))

dense = Dense(num_classes)

outputs = dense(input)
outputs = SampledSoftmax()([outputs, target_input])

model = Model([input, target_input], outputs)
model.compile(optimizer=u'adam', loss=custom_loss)
# train as desired

这篇关于如何在 Keras 模型中使用 TensorFlow 的采样 softmax 损失函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆