在Keras上使用解码器输入seq2seq模型连接Attention层 [英] Concatening Attention layer with decoder input seq2seq model on Keras

查看:550
本文介绍了在Keras上使用解码器输入seq2seq模型连接Attention层的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用keras库实现序列2序列模型.该模型的框图如下

I am trying to implement a sequence 2 sequence model with attention using keras library. The block diagram of the model is as follows

模型将输入序列嵌入3D张量中.然后,双向lstm创建编码层.接下来,将编码后的序列发送到自定义注意"层,该层返回具有每个隐藏节点注意权重的2d张量.解码器输入作为一个热矢量注入模型.现在在解码器(另一个bistlm)中,解码器输入和注意力权重都作为输入传递.解码器的输出被发送到具有softmax激活功能的时间分布密集层,以概率的方式获得每个时间步长的输出.该模型的代码如下:

The model embeds the input sequence into a 3D tensors. Then bidirectional lstm creates the encoding layer. Next the encoded sequences are sent to a custom Attention layer that returns a 2d tensor having attention weights for each hidden node. Decoder input is injected on the model as one hot vector. Now in the decoder (another bi-lstm) both decoder input and the attention weight are passed as input. The output of the decoder is sent to time distributed dense layer with softmax activation function to get the output for every time step in the means of probability. The code of the model is as follows:

encoder_input = Input(shape=(MAX_LENGTH_Input, ))

embedded = Embedding(input_dim= vocab_size_input, output_dim= embedding_width,trainable=False)(encoder_input)

encoder = Bidirectional(LSTM(units= hidden_size, input_shape=(MAX_LENGTH_Input,embedding_width), return_sequences=True, dropout=0.25,recurrent_dropout=0.25))(embedded)

attention = Attention(MAX_LENGTH_Input)(encoder)

decoder_input = Input(shape=(MAX_LENGTH_Output,vocab_size_output))    

merge = concatenate([attention, decoder_input])    

decoder = Bidirectional(LSTM(units=hidden_size, input_shape=(MAX_LENGTH_Output,vocab_size_output))(merge))

output = TimeDistributed(Dense(MAX_LENGTH_Output, activation="softmax"))(decoder)

问题是当我连接注意层和解码器输入时.由于解码器输入是3d张量,而注意是2d张量,因此其显示以下错误:

The problem is when i am concatenating attention layer and decoder input. Since the decoder input is a 3d tensor whereas attention is a 2d tensor, its showing following error:

ValueError:Concatenate层要求输入的形状与concat轴一致,但形状匹配.得到了输入形状:[(无,1024),(无,10,8281)]

ValueError: A Concatenate layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(None, 1024), (None, 10, 8281)]

如何将2d注意张量转换为3d张量?

How can I convert 2d Attention tensor into 3d tensor?

推荐答案

根据框图,您似乎在每个时间步都将相同的关注向量传递给解码器.在这种情况下,您需要RepeatVector在每个时间步复制相同的关注向量,以将2D注意张量转换为3D张量:

Based on your block diagram it looks like you pass the same attention vector at every timestep to the decoder. In that case you need to RepeatVector to copy the same attention vector at every timestep to convert a 2D attention tensor into a 3D tensor:

# ...
attention = Attention(MAX_LENGTH_Input)(encoder)
attention = RepeatVector(MAX_LENGTH_Output)(attention) # (?, 10, 1024)
decoder_input = Input(shape=(MAX_LENGTH_Output,vocab_size_output))
merge = concatenate([attention, decoder_input]) # (?, 10, 1024+8281)
# ...

请注意,这将在每个时间步重复相同的注意力向量.

Take note that this will repeat the same attention vector for every timestep.

这篇关于在Keras上使用解码器输入seq2seq模型连接Attention层的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆