Keras中的Bi-LSTM注意模型 [英] Bi-LSTM Attention model in Keras

查看:300
本文介绍了Keras中的Bi-LSTM注意模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用词嵌入的Bi-LSTM建立注意力模型.我遇到了如何在keras中添加注意力机制? https://github.com/philipperemy/keras-attention -mechanism/blob/master/attention_lstm.py https://github .com/keras-team/keras/issues/4962 .

I am trying to make an attention model with Bi-LSTM using word embeddings. I came across How to add an attention mechanism in keras?, https://github.com/philipperemy/keras-attention-mechanism/blob/master/attention_lstm.py and https://github.com/keras-team/keras/issues/4962.

但是,我对Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification的实现感到困惑.所以,

However, I am confused about the implementation of Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification. So,

_input = Input(shape=[max_length], dtype='int32')

# get the embedding layer
embedded = Embedding(
        input_dim=30000,
        output_dim=300,
        input_length=100,
        trainable=False,
        mask_zero=False
    )(_input)

activations = Bidirectional(LSTM(20, return_sequences=True))(embedded)

# compute importance for each step
attention = Dense(1, activation='tanh')(activations)

在这里我对哪个方程与本文的方程感到困惑.

I am confused here as to which equation is to what from the paper.

attention = Flatten()(attention)
attention = Activation('softmax')(attention)

RepeatVector会做什么?

What will RepeatVector do?

attention = RepeatVector(20)(attention)
attention = Permute([2, 1])(attention)


sent_representation = merge([activations, attention], mode='mul')

现在,我不确定这行为什么在这里.

Now, again I am not sure why this line is here.

sent_representation = Lambda(lambda xin: K.sum(xin, axis=-2), output_shape=(units,))(sent_representation)

由于我有两个课程,所以最终的softmax为:

Since I have two classes, I will have the final softmax as:

probabilities = Dense(2, activation='softmax')(sent_representation)

推荐答案

attention = Flatten()(attention)  

将您的注意力权重张量转换为向量(如果序列大小为max_length,则大小为max_length).

transform your tensor of attention weights in a vector (of size max_length if your sequence size is max_length).

attention = Activation('softmax')(attention)

允许所有注意权重在0和1之间,所有权重之和等于1.

allows having all the attention weights between 0 and 1, the sum of all the weights equal to one.

attention = RepeatVector(20)(attention)
attention = Permute([2, 1])(attention)


sent_representation = merge([activations, attention], mode='mul')

RepeatVector用隐藏状态(20)的大小重复注意权重向量(大小为max_len),以便逐个元素地将激活和隐藏状态相乘.张量变量激活的大小为max_len * 20.

RepeatVector repeat the attention weights vector (which is of size max_len) with the size of the hidden state (20) in order to multiply the activations and the hidden states element-wise. The size of the tensor variable activations is max_len*20.

sent_representation = Lambda(lambda xin: K.sum(xin, axis=-2), output_shape=(units,))(sent_representation)

此Lambda层将加权的隐藏状态向量求和,以获取将在最后使用的向量.

This Lambda layer sum the weighted hidden states vectors in order to obtain the vector that will be used at the end.

希望这对您有所帮助!

Hope this helped!

这篇关于Keras中的Bi-LSTM注意模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆