在 Keras 中为多标签文本分类神经网络创建一个带有注意力的 LSTM 层 [英] Create an LSTM layer with Attention in Keras for multi-label text classification neural network

查看:40
本文介绍了在 Keras 中为多标签文本分类神经网络创建一个带有注意力的 LSTM 层的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好,亲爱的社区成员.我正在创建一个神经网络来预测多标签 y.具体来说,神经网络接受 5 个输入(演员列表、情节摘要、电影特色、电影评论、片名)并尝试预测电影类型的顺序.在神经网络中,我使用了嵌入层和全局最大池化层.

Greetings dear members of the community. I am creating a neural network to predict a multi-label y. Specifically, the neural network takes 5 inputs (list of actors, plot summary, movie features, movie reviews, title) and tries to predict the sequence of movie genres. In the neural network I use Embeddings Layer and Global Max Pooling layers.

然而,我最近发现了具有注意力的循环层,这是当今机器学习翻译中一个非常有趣的话题.所以,我想知道我是否可以使用这些层之一,但只能使用 Plot Summary 输入.请注意,我不做机器学习翻译,而是做文本分类.

However, I recently discovered the Recurrent Layers with Attention, which are a very interesting topic these days in machine learning translation. So, I wondered if I could use one of those layers but only the Plot Summary input. Note that I don't do ml translation but rather text classification.

我的神经网络当前状态

def create_fit_keras_model(hparams,
                           version_data_control,
                           optimizer_name,
                           validation_method,
                           callbacks,
                           optimizer_version = None):

    sentenceLength_actors = X_train_seq_actors.shape[1]
    vocab_size_frequent_words_actors = len(actors_tokenizer.word_index)

    sentenceLength_plot = X_train_seq_plot.shape[1]
    vocab_size_frequent_words_plot = len(plot_tokenizer.word_index)

    sentenceLength_features = X_train_seq_features.shape[1]
    vocab_size_frequent_words_features = len(features_tokenizer.word_index)

    sentenceLength_reviews = X_train_seq_reviews.shape[1]
    vocab_size_frequent_words_reviews = len(reviews_tokenizer.word_index)

    sentenceLength_title = X_train_seq_title.shape[1]
    vocab_size_frequent_words_title = len(title_tokenizer.word_index)

    model = keras.Sequential(name='{0}_{1}dim_{2}batchsize_{3}lr_{4}decaymultiplier_{5}'.format(sequential_model_name, 
                                                                                                str(hparams[HP_EMBEDDING_DIM]), 
                                                                                                str(hparams[HP_HIDDEN_UNITS]),
                                                                                                str(hparams[HP_LEARNING_RATE]), 
                                                                                                str(hparams[HP_DECAY_STEPS_MULTIPLIER]),
                                                                                                version_data_control))
    actors = keras.Input(shape=(sentenceLength_actors,), name='actors_input')
    plot = keras.Input(shape=(sentenceLength_plot,), batch_size=hparams[HP_HIDDEN_UNITS], name='plot_input')
    features = keras.Input(shape=(sentenceLength_features,), name='features_input')
    reviews = keras.Input(shape=(sentenceLength_reviews,), name='reviews_input')
    title = keras.Input(shape=(sentenceLength_title,), name='title_input')

    emb1 = layers.Embedding(input_dim = vocab_size_frequent_words_actors + 2,
                            output_dim = 16, #hparams[HP_EMBEDDING_DIM], hyperparametered or fixed sized.
                            embeddings_initializer = 'uniform',
                            mask_zero = True,
                            input_length = sentenceLength_actors,
                            name="actors_embedding_layer")(actors)
    
    # encoded_layer1 = layers.GlobalAveragePooling1D(name="globalaveragepooling_actors_layer")(emb1)
    encoded_layer1 = layers.GlobalMaxPooling1D(name="globalmaxpooling_actors_layer")(emb1)
    
    emb2 = layers.Embedding(input_dim = vocab_size_frequent_words_plot + 2,
                            output_dim = hparams[HP_EMBEDDING_DIM],
                            embeddings_initializer = 'uniform',
                            mask_zero = True,
                            input_length = sentenceLength_plot,
                            name="plot_embedding_layer")(plot)
    # (Option 1)
    # encoded_layer2 = layers.GlobalMaxPooling1D(name="globalmaxpooling_plot_summary_Layer")(emb2)
 
    # (Option 2)
    emb2 = layers.Bidirectional(layers.LSTM(hparams[HP_EMBEDDING_DIM], return_sequences=True))(emb2)
    avg_pool = layers.GlobalAveragePooling1D()(emb2)
    max_pool = layers.GlobalMaxPooling1D()(emb2)
    conc = layers.concatenate([avg_pool, max_pool])

    # (Option 3)
    # emb2 = layers.Bidirectional(layers.LSTM(hparams[HP_EMBEDDING_DIM], return_sequences=True))(emb2)
    # emb2 = layers.Bidirectional(layers.LSTM(hparams[HP_EMBEDDING_DIM], return_sequences=True))(emb2)
    # emb2 = AttentionWithContext()(emb2)

    emb3 = layers.Embedding(input_dim = vocab_size_frequent_words_features + 2,
                            output_dim = hparams[HP_EMBEDDING_DIM],
                            embeddings_initializer = 'uniform',
                            mask_zero = True,
                            input_length = sentenceLength_features,
                            name="features_embedding_layer")(features)
    
    # encoded_layer3 = layers.GlobalAveragePooling1D(name="globalaveragepooling_movie_features_layer")(emb3)
    encoded_layer3 = layers.GlobalMaxPooling1D(name="globalmaxpooling_movie_features_layer")(emb3)
    
    emb4 = layers.Embedding(input_dim = vocab_size_frequent_words_reviews + 2,
                            output_dim = hparams[HP_EMBEDDING_DIM],
                            embeddings_initializer = 'uniform',
                            mask_zero = True,
                            input_length = sentenceLength_reviews,
                            name="reviews_embedding_layer")(reviews)
    
    # encoded_layer4 = layers.GlobalAveragePooling1D(name="globalaveragepooling_user_reviews_layer")(emb4)
    encoded_layer4 = layers.GlobalMaxPooling1D(name="globalmaxpooling_user_reviews_layer")(emb4)

    emb5 = layers.Embedding(input_dim = vocab_size_frequent_words_title + 2,
                            output_dim = hparams[HP_EMBEDDING_DIM],
                            embeddings_initializer = 'uniform',
                            mask_zero = True,
                            input_length = sentenceLength_title,
                            name="title_embedding_layer")(title)
    
    # encoded_layer5 = layers.GlobalAveragePooling1D(name="globalaveragepooling_movie_title_layer")(emb5)
    encoded_layer5 = layers.GlobalMaxPooling1D(name="globalmaxpooling_movie_title_layer")(emb5)

    merged = layers.concatenate([encoded_layer1, conc, encoded_layer3, encoded_layer4, encoded_layer5], axis=-1) #(Option 2)
    # merged = layers.concatenate([encoded_layer1, emb2, encoded_layer3, encoded_layer4, encoded_layer5], axis=-1) #(Option 3)

    dense_layer_1 = layers.Dense(hparams[HP_HIDDEN_UNITS],
                                 kernel_regularizer=regularizers.l2(neural_network_parameters['l2_regularization']),
                                 activation=neural_network_parameters['dense_activation'],
                                 name="1st_dense_hidden_layer_concatenated_inputs")(merged)
    
    layers.Dropout(neural_network_parameters['dropout_rate'])(dense_layer_1)
    
    output_layer = layers.Dense(neural_network_parameters['number_target_variables'],
                                activation=neural_network_parameters['output_activation'],
                                name='output_layer')(dense_layer_1)

    model = keras.Model(inputs=[actors, plot, features, reviews, title], outputs=output_layer, name='{0}_{1}dim_{2}batchsize_{3}lr_{4}decaymultiplier_{5}'.format(sequential_model_name, 
                                                                                                                                                                  str(hparams[HP_EMBEDDING_DIM]), 
                                                                                                                                                                  str(hparams[HP_HIDDEN_UNITS]),
                                                                                                                                                                  str(hparams[HP_LEARNING_RATE]), 
                                                                                                                                                                  str(hparams[HP_DECAY_STEPS_MULTIPLIER]),
                                                                                                                                                                  version_data_control))
    print(model.summary())
    
#     pruning_schedule = tfmot.sparsity.keras.PolynomialDecay(initial_sparsity=0.0,
#                                                             final_sparsity=0.4,
#                                                             begin_step=600,
#                                                             end_step=1000)
    
#     model_for_pruning = tfmot.sparsity.keras.prune_low_magnitude(model, pruning_schedule=pruning_schedule)
    
    if optimizer_name=="adam" and optimizer_version is None:
        
        optimizer = optimizer_adam_v2(hparams)
        
    elif optimizer_name=="sgd" and optimizer_version is None:
        
        optimizer = optimizer_sgd_v1(hparams, "no decay")
        
    elif optimizer_name=="rmsprop" and optimizer_version is None:
        
        optimizer = optimizer_rmsprop_v1(hparams)

    print("here: {0}".format(optimizer.lr))

    lr_metric = [get_lr_metric(optimizer)]
    
    if type(get_lr_metric(optimizer)) in (float, int):

        print("Learning Rate's type is Float or Integer")
        model.compile(optimizer=optimizer,
                      loss=neural_network_parameters['model_loss'],
                      metrics=neural_network_parameters['model_metric'] + lr_metric, )
    else:
        print("Learning Rate's type is not Float or Integer, but rather {0}".format(type(lr_metric)))
        model.compile(optimizer=optimizer,
                      loss=neural_network_parameters['model_loss'],
                      metrics=neural_network_parameters['model_metric'], ) #+ lr_metric

你会在上面的结构中看到我有 5 个输入层,5 个嵌入层,然后我只在 Plot Summary 输入中在 LSTM 上应用了一个双向层.

You will see in the above structure that I have 5 input layers, 5 Embedding layers, then I apply a Bidirectional layer on LSTM only in the Plot Summary input.

但是,使用当前对 Plot 摘要的双向方法,我收到以下错误.我的问题是如何利用文本分类中的注意力而不解决下面的错误.所以,不要评论这个错误的解决方案.

However, with the current bidirectional approach on Plot summary, I got the following error. My problem is how I can utilize the attention in text classification and not solve the error below. So, don't comment solution on this error.

我的问题是关于如何创建一个循环层的建议方法,注意情节摘要(输入 2).另外,请不要犹豫,在评论中写下任何可能有助于我在 Keras 中实现这一目标的文章.

My question is about suggesting ways on how to create a recurrent layer with attention for the plot summary (input 2). Also, do not hesitate to write in comments any article that might help me on achieving this in Keras.

如果需要有关神经网络结构的任何其他信息,我会随时为您服务.

I remain at your disposal if any additional information is required regarding the structure of the neural network.

如果你觉得上面的神经网络很复杂,我可以做一个简单的版本.然而,以上是我原来的神经网络,所以我希望任何建议都基于那个 nn.

14.12.2020

此处找到带有我的代码的 colab 笔记本想执行.代码包含了两个答案,一个是在评论中提出的(来自一个已经回答的问题,另一个是作为对我的问题的正式回答.

Find here the colab notebook with the code I want to execute. The code has included two answers, one proposed in the comments (from an already answered question, and the other written as an official answer to my question.

@MarcoCerliani 提出的第一种方法有效.虽然,我也想要第二种工作方法.@Allohvk 的方法(两种方法都在所附 colab 的运行时单元 [21] 中实现).后者目前不起作用.我得到的最新错误是:

The first approach proposed by @MarcoCerliani works. Although, I would like also the second approach to work. The approach of @Allohvk (both approaches are implemented in the Runtime cell [21] of the attached colab). The latter does not work at the moment. The latest error I get is:

ValueError:层 globalmaxpooling_plot_summary_Layer 的输入 0 与层不兼容:预期 ndim=3,发现 ndim=2.收到完整形状:[无,100]

ValueError: Input 0 of layer globalmaxpooling_plot_summary_Layer is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [None, 100]

我通过从我的神经网络结构中删除 globalmaxpooling_plot_summary_Layer 解决了我编辑的最新错误.

I solved the latest error of my edit by removing the globalmaxpooling_plot_summary_Layer from my neural's network structure.

推荐答案

让我总结一下意图.你想增加对你的代码的关注.您的任务是序列分类任务,而不是 seq-seq 翻译器.您不太关心它的完成方式,因此您可以不调试上面的错误,但只需要一段有效的代码.我们在这里的主要输入是由您想要引起注意的n"个词组成的电影评论.

Let me summarize the intent. You want to add attention to your code. Yours is a sequence classification task and not a seq-seq translator. You dont really care much about the way it is done, so you are ok with not debugging the error above, but just need a working piece of code. Our main input here is the movie reviews consisting of 'n' words for which you want to add attention.

假设您嵌入了评论并将其传递给 LSTM 层.现在您想要参与"LSTM 层的所有隐藏状态,然后生成分类(而不是仅使用编码器的最后一个隐藏状态).所以需要插入一个注意力层.准系统实现如下所示:

Assume you embed the reviews and pass it to an LSTM layer. Now you want to 'attend' to all the hidden states of the LSTM layer and then generate a classification (instead of just using the last hidden state of the encoder). So an attention layer needs to be inserted. A barebones implementation would look like this:

    def __init__(self):    
        ##Nothing special to be done here
        super(peel_the_layer, self).__init__()
        
    def build(self, input_shape):
        ##Define the shape of the weights and bias in this layer
        ##This is a 1 unit layer. 
        units=1
        ##last index of the input_shape is the number of dimensions of the prev
        ##RNN layer. last but 1 index is the num of timesteps
        self.w=self.add_weight(name="att_weights", shape=(input_shape[-1], units), initializer="normal") #name property is useful for avoiding RuntimeError: Unable to create link.
        self.b=self.add_weight(name="att_bias", shape=(input_shape[-2], units), initializer="zeros")
        super(peel_the_layer,self).build(input_shape)
        
    def call(self, x):
        ##x is the input tensor..each word that needs to be attended to
        ##Below is the main processing done during training
        ##K is the Keras Backend import
        e = K.tanh(K.dot(x,self.w)+self.b)
        a = K.softmax(e, axis=1)
        output = x*a
        
        ##return the ouputs. 'a' is the set of attention weights
        ##the second variable is the 'attention adjusted o/p state' or context
        return a, K.sum(output, axis=1)

现在在 LSTM 之后和 Dense 输出层之前调用上面的注意力层.

Now call the above Attention layer after your LSTM and before your Dense output layer.

        a, context = peel_the_layer()(lstm_out)
        ##context is the o/p which be the input to your classification layer
        ##a is the set of attention weights and you may want to route them to a display

您可以在此基础上进行构建,因为您似乎想使用除电影评论之外的其他功能来得出最终情绪.注意力主要集中在评论上.如果句子很长,就会看到好处.

You can build on top of this as you seem to want to use other features apart for the movie reviews to come up with the final sentiment. Attention largely applies to reviews..and benefits are to be seen if the sentences are very long.

更多具体细节请参考https://towardsdatascience.com/create-your-own-custom-attention-layer-understand-all-flavours-2201b5e8be9e

这篇关于在 Keras 中为多标签文本分类神经网络创建一个带有注意力的 LSTM 层的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆