为什么在Transformer模型中将嵌入向量乘以常数? [英] Why does embedding vector multiplied by a constant in Transformer model?

查看：466 发布时间：2020/6/21 19:41:04 python tensorflow deep-learning attention-model

本文介绍了为什么在Transformer模型中将嵌入向量乘以常数?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在学习应用注意是您所需要的提出的转换模型 >来自tensorflow官方文档用于理解语言的Transformer模型.

I am learning to apply Transform model proposed by Attention Is All You Need from tensorflow official document Transformer model for language understanding.

在位置编码部分中说:

由于该模型不包含任何重复或卷积，添加了位置编码，以为模型提供有关以下内容的信息句子中单词的相对位置.

Since this model doesn't contain any recurrence or convolution, positional encoding is added to give the model some information about the relative position of the words in the sentence.

位置编码矢量已添加到嵌入矢量.

我的理解是将positional encoding vector直接添加到embedding vector.但是当我查看代码时，发现embedding vector乘以一个常数.

My understanding is to add positional encoding vector directly to embedding vector. But I found embedding vector multiplied by a constant when I looked at the code.

编码器部分中的代码如下:

class Encoder(tf.keras.layers.Layer):
  def __init__(self, num_layers, d_model, num_heads, dff, input_vocab_size, 
               rate=0.1):
    super(Encoder, self).__init__()

    self.d_model = d_model
    self.num_layers = num_layers

    self.embedding = tf.keras.layers.Embedding(input_vocab_size, d_model)
    self.pos_encoding = positional_encoding(input_vocab_size, self.d_model)


    self.enc_layers = [EncoderLayer(d_model, num_heads, dff, rate) 
                       for _ in range(num_layers)]

    self.dropout = tf.keras.layers.Dropout(rate)

  def call(self, x, training, mask):

    seq_len = tf.shape(x)[1]

    # adding embedding and position encoding.
    x = self.embedding(x)  # (batch_size, input_seq_len, d_model)
    x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
    x += self.pos_encoding[:, :seq_len, :]

    x = self.dropout(x, training=training)

    for i in range(self.num_layers):
      x = self.enc_layers[i](x, training, mask)

    return x  # (batch_size, input_seq_len, d_model)

我们可以在x += self.pos_encoding[:, :seq_len, :]之前看到x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32)).

为什么在Transformer模型中添加位置编码之前，将嵌入矢量乘以一个常数?

So why does embedding vector multiplied by a constant before adding positional encoding in Transformer model?

为什么在Transformer模型中将嵌入向量乘以常数? [英] Why does embedding vector multiplied by a constant in Transformer model?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

为什么在Transformer模型中将嵌入向量乘以常数? [英] Why does embedding vector multiplied by a constant in Transformer model?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭