如何在 keras 中添加注意力机制? [英] How to add an attention mechanism in keras?

查看:58
本文介绍了如何在 keras 中添加注意力机制?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前使用的代码是从 关于 github 的一个讨论这是注意力机制的代码:

I'm currently using this code that i get from one discussion on github Here's the code of the attention mechanism:

_input = Input(shape=[max_length], dtype='int32')

# get the embedding layer
embedded = Embedding(
        input_dim=vocab_size,
        output_dim=embedding_size,
        input_length=max_length,
        trainable=False,
        mask_zero=False
    )(_input)

activations = LSTM(units, return_sequences=True)(embedded)

# compute importance for each step
attention = Dense(1, activation='tanh')(activations)
attention = Flatten()(attention)
attention = Activation('softmax')(attention)
attention = RepeatVector(units)(attention)
attention = Permute([2, 1])(attention)


sent_representation = merge([activations, attention], mode='mul')
sent_representation = Lambda(lambda xin: K.sum(xin, axis=-2), output_shape=(units,))(sent_representation)

probabilities = Dense(3, activation='softmax')(sent_representation)

这是正确的做法吗?我有点期待时间分布层的存在,因为注意力机制分布在 RNN 的每个时间步长中.我需要有人来确认这个实现(代码)是注意力机制的正确实现.谢谢.

Is this the correct way to do it? i was sort of expecting the existence of time distributed layer since attention mechanism is distributed in every time step of the RNN. I need someone to confirm that this implementation(the code) is a correct implementation of attention mechanism. Thank you.

推荐答案

如果你想关注时间维度,那么这部分代码在我看来是正确的:

If you want to have an attention along the time dimension, then this part of your code seems correct to me:

activations = LSTM(units, return_sequences=True)(embedded)

# compute importance for each step
attention = Dense(1, activation='tanh')(activations)
attention = Flatten()(attention)
attention = Activation('softmax')(attention)
attention = RepeatVector(units)(attention)
attention = Permute([2, 1])(attention)

sent_representation = merge([activations, attention], mode='mul')

你已经计算出形状(batch_size, max_length)的注意力向量:

You've worked out the attention vector of shape (batch_size, max_length):

attention = Activation('softmax')(attention)

我以前从未见过这个代码,所以我不能说这个代码是否真的正确:

I've never seen this code before, so I can't say if this one is actually correct or not:

K.sum(xin, axis=-2)

进一步阅读(你可以看看):

Further reading (you might have a look):

https://github.com/philipperemy/keras-attention-mechanism

这篇关于如何在 keras 中添加注意力机制?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆