在 Keras 中使用自定义注意力层 [英] Custom Attention Layer using in Keras

查看:32
本文介绍了在 Keras 中使用自定义注意力层的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想创建一个自定义注意力层,用于任何时候的输入,该层返回所有时间输入的加权平均值.

I want to create a custom attention layer that for input at any time this layer returns the weighted mean of inputs at all time inputs.

例如,我希望形状为 [32,100,2048] 的输入张量进入层,然后我得到形状为 [32,100,2048] 的张量.我写的层如下:

For Example, I want that input tensor with shape [32,100,2048] goes to layer and I get the tensor with the shape [32,100,2048]. I wrote the Layer as follow:

import tensorflow as tf

from keras.layers import Layer, Dense

#or

from tensorflow.keras.layers import Layer, Dense


class Attention(Layer):

  def __init__(self, units_att):

     self.units_att = units_att
     self.W = Dense(units_att)
     self.V = Dense(1)
     super().__init__()

  def __call__(self, values):

      t = tf.constant(0, dtype= tf.int32)    
      time_steps = tf.shape(values)[1]
      initial_outputs = tf.TensorArray(dtype=tf.float32, size=time_steps)
      initial_att =  tf.TensorArray(dtype=tf.float32, size=time_steps)

      def should_continue(t, *args):
          return t < time_steps

      def iteration(t, values, outputs, atts):

        score = self.V(tf.nn.tanh(self.W(values)))

        # attention_weights shape == (batch_size, time_step, 1)
        attention_weights = tf.nn.softmax(score, axis=1)

        # context_vector shape after sum == (batch_size, hidden_size)
        context_vector = attention_weights * values
        context_vector = tf.reduce_sum(context_vector, axis=1)

        outputs = outputs.write(t, context_vector)
        atts = atts.write(t, attention_weights)
        return t + 1, values, outputs, atts

      t, values, outputs, atts = tf.while_loop(should_continue, iteration,
                                  [t, values, initial_outputs, initial_att])

      outputs = outputs.stack()
      outputs = tf.transpose(outputs, [1,0,2])

      atts = atts.stack()
      atts = tf.squeeze(atts, -1)
      atts = tf.transpose(atts, [1,0,2])
      return t, values, outputs, atts

对于 input= tf.constant(2, shape= [32, 100, 2048], dtype= tf.float32) 我得到tf2 中的 shape = [32,100,2048] 和 tf1 中的 [32,None, 2048] 输出.

For input= tf.constant(2, shape= [32, 100, 2048], dtype= tf.float32) I get the output with shape = [32,100,2048] in tf2 and [32,None, 2048] in tf1.

对于输入 input= Input(shape= (None, 2048)) 我在 tf1 中使用 shape = [None, None, 2048] 得到输出,我得到错误

For Input input= Input(shape= (None, 2048)) I get the output with shape = [None, None, 2048] in tf1 and I get error

TypeError: 'Tensor' 对象不能被解释为整数

TypeError: 'Tensor' object cannot be interpreted as an integer

在 tf2 中.

最后,在这两种情况下,我都不能在我的模型中使用这一层,因为我的模型输入是 Input(shape= (None, 2048)) 并且我得到错误

Finally, in both cases, I can't use this layer in my model because my model input is Input(shape= (None, 2048)) and I get the error

AttributeError: 'NoneType' 对象没有属性 '_inbound_nodes'

AttributeError: 'NoneType' object has no attribute '_inbound_nodes'

在 tf1 和 tf2 中,我遇到了与上述相同的错误,我使用 Keras 函数方法创建了我的模型.

in tf1 and in tf2 I get the same error as said in above, I create my model with Keras functional method.

推荐答案

从您分享的代码来看,您似乎想在您的代码中实现 Bahdanau 的注意力层.您想关注所有值"(上一层输出 - 它的所有隐藏状态),而您的查询"将是解码器的最后一个隐藏状态.您的代码实际上应该非常简单,应该如下所示:

From the code you have shared, looks like you want to implement Bahdanau's attention layer in your code. You want to attend to all the 'values' (prev layer output - all its hidden states) and your 'query' would be the last hidden state of the decoder. Your code should actually be very simple and should look like:

        class Bahdanau(tf.keras.layers.Layer):
            def __init__(self, n):
                super(Bahdanau, self).__init__()
                self.w = tf.keras.layers.Dense(n)
                self.u = tf.keras.layers.Dense(n)
                self.v = tf.keras.layers.Dense(1)
        
            def call(self, query, values):
                query = tf.expand_dims(query, 1)
                e = self.v(tf.nn.tanh(self.w(query) + self.u(values)))
                a = tf.nn.softmax(e, axis=1)
                c = a * values
                c = tf.reduce_sum(c, axis=1)
                return a,c
        
        ##Say we want 10 units in the single layer MLP determining w,u
        attentionlayer = Bahdanau(10)
        ##Call with i/p: decoderstate @ t-1 and all encoder hidden states
        a, c = attentionlayer(stminus1, hj)
    

我们没有在代码的任何地方指定张量形状.此代码将返回一个与查询"的stminus1"大小相同的上下文张量.它在使用 Bahdanau 的注意力机制关注所有值"(解码器的所有输出状态)之后执行此操作.

We are not specifying the tensor shape anywhere in the code. This code will return you a context tensor of same size as 'stminus1' which is the 'query'. It does this after attending to all the 'values' (all output states of decoder) using Bahdanau's attention mechanism.

因此假设您的批次大小为 32,时间步长 = 100 且嵌入维度 = 2048,则 stminus1 的形状应为 (32,2048),而 hj 的形状应为 (32,100,2048).输出上下文的形状将是 (32,2048).我们还返回了 100 个注意力权重,以防万一您想将它们路由到一个漂亮的显示.

So assuming your batch size is 32, timesteps=100 and embedding dimension=2048, the shape of stminus1 should be (32,2048) and the shape of the hj should be (32,100,2048). The shape of the output context would be (32,2048). We also returned the 100 attention weights just in case you want to route them to a nice display.

这是注意力"的最简单版本.如果您有任何其他意图,请告诉我,我会重新格式化我的答案.更具体的细节请参考https:///towardsdatascience.com/create-your-own-custom-attention-layer-understand-all-flavours-2201b5e8be9e

This is the simplest version of 'Attention'. If you have any other intent, please let me know and I will reformat my answer. For more specific details, please refer https://towardsdatascience.com/create-your-own-custom-attention-layer-understand-all-flavours-2201b5e8be9e

这篇关于在 Keras 中使用自定义注意力层的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆