Keras AdditiveAttention 层的输出形状 [英] Output shapes of Keras AdditiveAttention Layer

查看：41 发布时间：2021/9/5 19:02:33 tensorflow keras deep-learning neural-network attention-model

本文介绍了Keras AdditiveAttention 层的输出形状的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

尝试使用 Keras 中的 AdditiveAttention 层.关于 tensorflow 教程中层的手动实现 https://www.tensorflow.org/tutorials/text/nmt_with_attention

Trying to use the AdditiveAttention layer in Keras. On manual implementation of the layer from tensorflow tutorial https://www.tensorflow.org/tutorials/text/nmt_with_attention

import tensorflow as tf 

class BahdanauAttention(tf.keras.layers.Layer):
  def __init__(self, units):
    super(BahdanauAttention, self).__init__()
    self.W1 = tf.keras.layers.Dense(units)
    self.W2 = tf.keras.layers.Dense(units)
    self.V = tf.keras.layers.Dense(1)

  def call(self, query, values):
    query_with_time_axis = tf.expand_dims(query, 1)
    score = self.V(tf.nn.tanh(
        self.W1(query_with_time_axis) + self.W2(values)))
    attention_weights = tf.nn.softmax(score, axis=1)

    # context_vector shape after sum == (batch_size, hidden_size)
    context_vector = attention_weights * values
    context_vector = tf.reduce_sum(context_vector, axis=1)
    return context_vector, attention_weights

context_vector 的形状是 (batch_size, units)

而使用与 keras 内置

从 tensorflow.keras.layers 导入 AdditiveAttention

context_vector 的shape = [batch_size, Tq, dim]

关于导致这种 OP 形状 差异的原因的任何建议都会很有用.

Any suggestions on what is causing this OP shape difference will be useful.

推荐答案

除了一些变化外，这两种实现都非常相似.该教程中 BahdanauAttention 的实现是一种简化和改编的版本，并使用了一些线性变换.您想知道的 context_vector 的返回形状只不过是输入数据形状的问题.下面是一些演示，让我们看看教程实现:

Both implementations are mutually similar except for some variation. The implementation of BahdanauAttention in that tutorial is a kinda simplified and adapted version and uses some linear transformation. The return shape of context_vector that you're wondering is nothing but the issue of input data shape. Here is some demonstration, let's see the tutorial implementation:

class BahdanauAttention(tf.keras.layers.Layer):
  def __init__(self, units):
    super(BahdanauAttention, self).__init__()
    self.W1 = tf.keras.layers.Dense(units)
    self.W2 = tf.keras.layers.Dense(units)
    self.V  = tf.keras.layers.Dense(1)

  def call(self, query, values):
    query_with_time_axis = tf.expand_dims(query, 1)
    score = self.V(tf.nn.tanh(self.W1(query_with_time_axis) + self.W2(values)))
    attention_weights = tf.nn.softmax(score, axis=1)
    context_vector = attention_weights * values
    context_vector = tf.reduce_sum(context_vector, axis=1)
    return context_vector, attention_weights

现在，我们向它传递一些输入，3D 和 2D.

Now, we pass some input to it, 3D and 2D.

attention_layer = BahdanauAttention(10)

y = tf.random.uniform((2, 60, 512))  
out, attn = attention_layer(y, y)
out.shape , attn.shape
(TensorShape([2, 60, 512]), TensorShape([2, 2, 60, 1]))

y = tf.random.uniform((2, 512))  
out, attn = attention_layer(y, y)
out.shape , attn.shape
(TensorShape([2, 512]), TensorShape([2, 2, 1]))

现在，将相同的输入传递给内置的AdditiveAttention，看看我们会得到什么

Now, passing the same inputs to the built-in AdditiveAttention and see what we'll get

buit_attn = tf.keras.layers.AdditiveAttention()

y = tf.random.uniform((2, 60, 512))  
out, attn = buit_attn([y, y], return_attention_scores=True)
out.shape , attn.shape
(TensorShape([2, 60, 512]), TensorShape([2, 60, 60]))

y = tf.random.uniform((2, 512))  
out, attn = buit_attn([y, y], return_attention_scores=True)
out.shape , attn.shape
(TensorShape([2, 512]), TensorShape([2, 2]))

所以，context_vector 的形状在这里是可比的，但 attention_weights 的形状不是.原因是，正如我们所提到的，我相信该教程的实现有点修改和采用.如果我们看BahdanauAttention 或 AdditiveAttention，我们会得到:

So, the shape of the context_vector is comparable here, but not the shape of attention_weights. The reason is, as we mentioned, the implementation of that tutorial kinda modified and adopted I believe. If we look at the calculation of BahdanauAttention or AdditiveAttention, we will get:

将 query 和 value 重塑为形状 [batch_size, Tq, 1, dim] 和 [batch_size, 1, Tv,dim] 分别.
使用形状 [batch_size, Tq, Tv] 计算分数作为非线性总和:scores = tf.reduce_sum(tf.tanh(query + value), axis=-1)
使用分数计算形状为 [batch_size, Tq, Tv] 的分布:分布 = tf.nn.softmax(scores).
使用分布创建具有形状 batch_size, Tq, dim] 的值的线性组合:返回 tf.matmul(distribution, value).

Reshape query and value into shapes [batch_size, Tq, 1, dim] and [batch_size, 1, Tv, dim] respectively.
Calculate scores with shape [batch_size, Tq, Tv] as a non-linear sum: scores = tf.reduce_sum(tf.tanh(query + value), axis=-1)
Use scores to calculate a distribution with shape [batch_size, Tq, Tv]: distribution = tf.nn.softmax(scores).
Use distribution to create a linear combination of values with shape batch_size, Tq, dim]: return tf.matmul(distribution, value).

而且我认为该教程中的实现与计算注意力权重特征有点不同.如果我们遵循上述方法(1 到 4)，我们也会为 attention_weights 获得相同的输出形状.这是方法，(但不是这里只是一个演示目的，不是一般性的.)

And I think the implementation in that tutorials is a bit different for calculating the attention weight features. If we follow the above approach (1 to 4), we will get the same output shape for attention_weights as well. Here is how, (but not here is just a demonstration purpose, not general.)

class BahdanauAttention(tf.keras.layers.Layer):
  def __init__(self, units):
    super(BahdanauAttention, self).__init__()
    self.W1 = tf.keras.layers.Dense(units)
    self.W2 = tf.keras.layers.Dense(units)
    self.V = tf.keras.layers.Dense(1)

  def call(self, query, values):
    query_with_time_axis = tf.expand_dims(query, 2)  # [batch_size, Tq, 1, dim]
    value_with_time_axis = tf.expand_dims(values, 1) # [batch_size, 1, Tv, dim]
    scores = tf.reduce_sum(tf.tanh(query_with_time_axis + 
                                   value_with_time_axis), axis=-1)
    distribution = tf.nn.softmax(scores)
    return tf.matmul(distribution, values), distribution

现在，如果我们传递相同的输入，我们将从两个实现中获得相同的输出形状.但是，一般来说，用例应该选择内置实现.

Now, if we pass the same input, we will get the same output shape from both implementations. However, in general, use cases, the built-in implementation should be picked.

attention_layer = BahdanauAttention(10)

y = tf.random.uniform((2, 60, 512))  
out, attn = attention_layer(y, y)
out.shape , attn.shape
(TensorShape([2, 60, 512]), TensorShape([2, 60, 60]))

buit_attn = tf.keras.layers.AdditiveAttention()
y = tf.random.uniform((2, 60, 512))  
out, attn = buit_attn([y, y], return_attention_scores=True)
out.shape , attn.shape
(TensorShape([2, 60, 512]), TensorShape([2, 60, 60]))

这篇关于Keras AdditiveAttention 层的输出形状的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Keras AdditiveAttention 层的输出形状 [英] Output shapes of Keras AdditiveAttention Layer

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Keras AdditiveAttention 层的输出形状 [英] Output shapes of Keras AdditiveAttention Layer

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭