在Tensorflow 2.0的简单LSTM层顶部添加关注 [英] Adding Attention on top of simple LSTM layer in Tensorflow 2.0

查看:1235
本文介绍了在Tensorflow 2.0的简单LSTM层顶部添加关注的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含一个LSTM和两个密集层的简单网络,

I have a simple network of one LSTM and two Dense layers as such:

model = tf.keras.Sequential()
model.add(layers.LSTM(20, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(layers.Dense(20, activation='sigmoid'))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(loss='mean_squared_error')

出于分类目的,它正在训练具有3个输入(归一化为0到1.0)和1个输出(二进制)的数据.该数据是时间序列数据,其中时间步长之间存在关系.

It is training on data with 3 inputs (normalized 0 to 1.0) and 1 output (binary) for the purpose of classification. The data is time series data where there is a relation between time steps.

    var1(t)   var2(t)   var3(t)  var4(t)
0  0.448850  0.503847  0.498571      0.0
1  0.450992  0.503480  0.501215      0.0
2  0.451011  0.506655  0.503049      0.0

模型是这样训练的:

history = model.fit(train_X, train_y, epochs=2800, batch_size=40, validation_data=(test_X, test_y), verbose=2, shuffle=False)
model.summary()

提供模型摘要:

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm (LSTM)                  (None, 20)                1920      
_________________________________________________________________
dense (Dense)                (None, 20)                420       
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 21        
=================================================================
Total params: 2,361
Trainable params: 2,361
Non-trainable params: 0

该模型运行良好.现在,我尝试用Attention层替换Dense(20)层.在线上的所有示例,教程等(包括TF文档)都是针对seq2seq模型的,在输入层具有一个嵌入层.我了解TF v1.x中的seq2seq实现,但是找不到我要尝试做的任何文档.我相信在新的API(v2.0)中,我需要执行以下操作:

The model works reasonably well. Now I am trying to replace the Dense(20) layer with an Attention layer. All the examples, tutorials, etc. online (including the TF docs) are for seq2seq models with an embedding layer at the input layer. I understand the seq2seq implementations in TF v1.x but I cannot find any documentation for what I am trying to do. I believe in the new API (v2.0) I need to do something like this:

lstm = layers.LSTM(20, input_shape=(train_X.shape[1], train_X.shape[2]), return_sequences=True)
lstm = tf.keras.layers.Bidirectional(lstm)
attention = layers.Attention() # this does not work

model = tf.keras.Sequential()
model.add(lstm)
model.add(attention)
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(loss='mean_squared_error')

当然,我会收到错误消息"必须在输入列表(即[查询,值]或[查询,值,键] 上调用注意层"

And of course I get the error "Attention layer must be called on a list of inputs, namely [query, value] or [query, value, key]"

在版本(2.0)中,对于这种情况(具有固定长度输入的时间序列数据),我不理解此解决方案.欢迎您提出任何有关注意此类问题的想法.

I do not understand the solution to this in version (2.0) and for this case (time series data with fixed length input). Any ideas on adding attention to this type of problem is welcome.

推荐答案

您必须像这样调用关注层:

You must call the attention layer like this:

attention = layers.Attention()([#a list of input layers to the attention layer here])

此处的API文档

这篇关于在Tensorflow 2.0的简单LSTM层顶部添加关注的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆