了解巴赫达瑙的注意力线性代数 [英] Understanding Bahdanau's Attention Linear Algebra

查看:78
本文介绍了了解巴赫达瑙的注意力线性代数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下图中,巴赫达瑙的加法注意力被视为等式4的第二部分.

Bahdanau's Additive Attention is recognized as the second part of equation 4 in the below image.

我试图弄清楚矩阵w1w2hthsv的形状,以便弄清楚在

I am trying to figure out the shapes of the matrices w1, w2, ht, hs and v in order to figure out how this mechanism is used in this paper

  1. hths可以具有不同的最终尺寸吗?说(批量大小,总单位)和(批量大小,时间窗口).上面提到的论文中的公式8似乎正在这样做.

  1. Can ht and hs have different final dimensions? say (batch size, total units) and (batch size, time window). Equation 8 in the mentioned paper above seem to be doing this.

上式中的等式8具有以下表示法:

Equation 8 in the above paper has the below notation:

这将扩展到什么范围?

(W1 . ht-1) + (W1 . Ct-1)

W1 . concatenation(ht-1, ct-1)

我已经看到两者都被使用了. 对于上述矩阵形状的任何快速解释,将不胜感激.

I have seen both being used. Any quick explanations of the above matrix shapes is much appreciated.

推荐答案

也许通过一个特定的示例来理解这一点可能会有所帮助:假设您有一条19字的推文,并且想要将其转换为另一种语言.您为单词创建嵌入,然后将其通过128个单元的双向LSTM层传递.现在,编码器为每个推文输出19个256个尺寸的隐藏状态. 假设解码器是单向的,具有128个单位.它开始翻译单词,同时在每个时间步并行输出隐藏状态.

Maybe understanding this with a specific example may help: Let us say you have a 19 word tweet and you want to convert it into another language. You create embeddings for the words and then pass it thru' a bi-directional LSTM layer of 128 units. The encoder now outputs 19 hidden states of 256 dimensions for every tweet. Let us say the decoder is uni-directional and has 128 units. It starts translating the words while parallely outputting a hidden state at each time step.

现在,您要引起巴赫达瑙对上述方程式的关注.您想要馈送解码器的s_tminus1和编码器(hj)的所有隐藏状态,并想要使用以下步骤获取上下文:

Now you want to bring in Bahdanau's attention to the above equation. You want to feed s_tminus1 of the decoder and all hidden states of the encoder (hj) and want to get the context using the following steps:

生成v *(w * s_tminus1 + u * hj)

generate v * (w * s_tminus1 + u*hj)

采用上面的softmax来获取每个tweet的19个关注权重,然后将这些关注权重乘以编码器隐藏状态得到加权和,即上下文.

Take a softmax of the above to get the 19 attention weights for each tweet and then multiply these attention weights by the encoder hidden states to get the weighted sum which is nothing but the context.

请注意,在Bahdanau模型中,解码器应该是单向的.然后形状如下:

Note that in Bahdanau model the decoder should be unidirectional. Then the shapes would be as follows:

假设n = 10个单位,用于取向层以确定w,u.然后:s_tminus1和hj的形状分别为(?,128)和(?,19,256).请注意,s_tminus1是t-1处的单个解码器隐藏状态,hj是双向编码器的19个隐藏状态.

Assume n=10 units for the alignment layer to determine w,u. Then: the shapes for s_tminus1 and hj would be (?,128) and (?,19,256). Note that s_tminus1 is the single decoder hidden state at t-1 and hj are the 19 hidden states of the bi-directional encoder.

我们必须将stminus1扩展为(?,1,128),以便随后沿时间轴进行加法运算. w,u,v的层权重将由框架分别自动确定为(?,128,10),(?, 256,10)和(?,10,1).注意self.w(stminus1)如何计算为(?,1,10).将其添加到self.u(hj)的每一个中,以得到(?,19,10)的形状.结果被馈送到self.v,输出为(?,19,1),它是我们想要的形状-一组19个权重. Softmaxing会赋予注意力权重.

We have to expand stminus1 to (?,1,128) for the addition that follows later along the time axis. The layer weights for w,u,v will be automatically determined by the framework as (?,128,10), (?,256,10) and (?,10,1) respectively. Notice how self.w(stminus1) works out to (?,1,10). This is added to each of the self.u(hj) to give a shape of (?,19,10). The result is fed to self.v and the output is (?,19,1) which is the shape we want - a set of 19 weights. Softmaxing this gives the attention weights.

将此注意力权重与每个编码器的隐藏状态相乘并求和会返回上下文.

Multiplying this attention weight with each encoder hidden state and summing up returns the context.

希望这可以澄清各种张量和权重形状的形状.

Hope this clarifies on the shapes of the various tensors and weight shapes.

要回答您的其他问题-ht和hs的尺寸可以不同,如上面的示例所示.至于您的其他问题,我已经看到两个向量被连接在一起,然后对其施加一个权重.至少这是我记得在原始论文中读到的内容

To answer your other questions - the dimensions of ht and hs can be different as shown in above example. As to your other question, I have seen the 2 vectors being concatenated and then a single weight applied on them..at least this is what I remember reading in the original paper

这篇关于了解巴赫达瑙的注意力线性代数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆