如何在Keras模型中使用TensorFlow的采样softmax损失函数? [英] How can I use TensorFlow's sampled softmax loss function in a Keras model?

查看:1026
本文介绍了如何在Keras模型中使用TensorFlow的采样softmax损失函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在用Keras训练语言模型,并希望通过将采样的softmax用作网络中的最终激活功能来加快训练速度.从TF文档看来,我需要为weightsbiases提供参数,但是我不确定这些输入的期望值.看来我可以在Keras中编写一个自定义函数,如下所示:

I'm training a language model in Keras and would like to speed up training by using sampled softmax as the final activation function in my network. From the TF docs, it looks like I need to supply arguments for weights and biases, but I'm unsure of what is expected as input for these. It seems like I could write a custom function in Keras as follows:

import keras.backend as K

def sampled_softmax(weights, biases, y_true, y_pred, num_sampled, num_classes):
    return K.sampled_softmax(weights, biases, y_true, y_pred, num_sampled, num_classes)

但是,我不确定如何将其插入"到我现有的网络中. LM的体系结构非常简单:

However, I'm unsure of how to "plug this in" to my existing network. The architecture for the LM is pretty dead-simple:

model = Sequential()
model.add(Embedding(input_dim=len(vocab), output_dim=256))
model.add(LSTM(1024, return_sequence=True))
model.add(Dense(output_dim=len(vocab), activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')

考虑到这种架构,在模型上调用compile方法时,我可以将sampled_softmax函数作为loss参数传递吗?还是需要将其编写为最后一个完全连接的层之后的层.这里的任何指导将不胜感激.谢谢.

Given this architecture, could I pass the sampled_softmax function as the loss argument when calling the compile method on the model? Or do this need to be written as a layer that comes after the final fully-connected layer. Any guidance here would be greatly appreciated. Thanks.

推荐答案

此处的主要观察结果是TensorFlow采样的softmax函数返回实际损失,而不是对与地面真实数据进行比较的一组可能的标签进行的预测然后将损失计算为一个单独的步骤.这使得模型设置有点怪异.

The key observation here is that the TensorFlow sampled softmax function returns actual losses, not a set of predictions over the set of possible labels to compare with the ground truth data to then compute losses as a separate step. This makes the model setup a little bit weird.

首先,我们在模型中添加第二个输入层,除了作为目标输出之外,还第二次将目标(训练)数据编码为输入.这用于sampled_softmax_loss函数的labels自变量.它必须是Keras输入,因为当我们实例化和建立模型时,它被视为输入.

First, we add a second input layer to the model that encodes the target (training) data a second time as an input, in addition to being the target output. This is used for the labels argument of the sampled_softmax_loss function. It needs to be a Keras input, because it's treated as an input when we go to instantiate and set up the model.

第二,我们构造一个新的自定义Keras层,调用sampled_softmax_loss函数,其中两个Keras层作为其输入:预测我们的班级的致密层的输出,然后是包含训练副本的第二个输入数据.请注意,我们在通过_keras_history实例变量来从原始完全连接层的输出张量中获取权重和偏差张量的过程中受到严重的黑客攻击.

Second, we construct a new custom Keras layer that calls the sampled_softmax_loss function with two Keras layers as its inputs: the output of the dense layer that predicts our classes, and then the second input that contains a copy of the training data. Note that we're doing some serious hackery accessing the _keras_history instance variable to fetch the weight and bias tensors from the output tensor of the original fully-connected layer.

最后,我们必须构建一个新的哑"损失函数,该函数将忽略训练数据,而仅使用sampled_softmax_loss函数报告的损失.

Finally, we have to construct a new "dumb" loss function that ignores the training data and just uses the loss reported by the sampled_softmax_loss function.

请注意,因为采样的softmax函数返回损失,而不是类预测,所以不能使用此模型规范进行验证或推断.您需要在新规范中重新使用此培训版本"中的受训图层,该规范将标准softmax函数应用于已应用默认激活功能的原始密集图层.

Note that because the sampled softmax function returns losses, not class predictions, you can't use this model specification for validation or inference. You'll need to re-use the trained layers from this "training version" in a new specification that applies a standard softmax function to the original dense layer which has the default activation function applied.

绝对有一种更优雅的方法可以执行此操作,但是我相信这可以工作,因此我认为我现在就按原样在此处发布它,而不是等到我有了一些更整洁的东西之后再发布.例如,您可能希望将类的数量作为SampledSoftmax层的参数,或者更好的是,像原始问题一样将所有这些都压缩到损失函数中,并避免两次传入训练数据.

There is definitely a more elegant way to do this, but I believe this works, so I figured I'd post it here now as-is rather than wait until I have something that's a little bit neater. For example, you'd probably want to make the number of classes an argument of the SampledSoftmax layer, or better yet, condense this all into the loss function as in the original question and avoid passing in the training data twice.

from keras.models import Model
from keras.layers import Input, Dense, Layer
from keras import backend as K

class SampledSoftmax(Layer):
    def __init__(self, **kwargs):
        super(SampledSoftmax, self).__init__(**kwargs)


    def call(self, inputs):
        """
        The first input should be the model as it were, and the second the
        target (i.e., a repeat of the training data) to compute the labels
        argument

        """
        # the labels input to this function is batch size by 1, where the
        # value at position (i, 1) is the index that is true (not zero)
        # e.g., (0, 0, 1) => (2) or (0, 1, 0, 0) => (1)
        return K.tf.nn.sampled_softmax_loss(weights=inputs[0]._keras_history[0].weights[0],
                                            biases=inputs[0]._keras_history[0].bias,
                                            inputs=inputs[0],
                                            labels=K.tf.reshape(K.tf.argmax(inputs[1], 1), [-1, 1]),
                                            num_sampled=1000,
                                            num_classes=200000)

def custom_loss(y_true, y_pred):
    return K.tf.reduce_mean(y_pred)


num_classes = 200000
input = Input(shape=(300,))
target_input = Input(shape=(num_classes,))

dense = Dense(num_classes)

outputs = dense(input)
outputs = SampledSoftmax()([outputs, target_input])

model = Model([input, target_input], outputs)
model.compile(optimizer=u'adam', loss=custom_loss)
# train as desired

这篇关于如何在Keras模型中使用TensorFlow的采样softmax损失函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆