BERT - 池化输出与序列输出的第一个向量不同 [英] BERT - Pooled output is different from first vector of sequence output

查看:21
本文介绍了BERT - 池化输出与序列输出的第一个向量不同的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 Tensorflow 中使用 BERT,但有一个细节我不太明白.根据文档 (https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1),汇集的输出是整个序列的.根据原始论文,这似乎是 setence 开头标记CLS"的输出.

I am using BERT in Tensorflow and there is one detail I dont quite understand. Accordin the the documentation (https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1), pooled output is the of the entire sequence. Based on the original paper, it seems like this is the output for the token "CLS" at the beginning of the setence.

pooled_output[0]

然而,当我查看句子中第一个标记对应的输出时

However, when I look at the output corresponding to the first token in the sentence

setence_output[0,0,:]

我认为对应于标记CLS"(句子中的第一个标记),2个结果不同.

which I believe corresponds to the token "CLS" (the first token in the sentence), the 2 results are different.

推荐答案

pooled_outputsequence_output 的意图是不同的.由于已知来自输出层的 BERT 模型的嵌入是上下文嵌入,因此第一个令牌的输出,即 [CLS] 令牌将捕获足够的上下文.因此,BERT 论文的作者发现,对于分类等少数任务,仅使用第一个标记的输出就足够了.他们将此来自单个令牌(即第一个令牌)的输出称为 pooled_output.

The intention of pooled_output and sequence_output are different. Since, the embeddings from the BERT model at the output layer are known to be contextual embeddings, the output of the 1st token, i.e, [CLS] token would have captured sufficient context. Hence, the authors of BERT paper found it sufficient to use only the output from the 1st token for few tasks such as classification. They call this output from the single token (i.e, 1st token) as pooled_output.

由于 TF Hub 模块的源代码不可用,并且假设 TFHub 将使用与 BERT 作者的开源版本代码相同的实现(https://github.com/google-research/bert/).正如 modeling.py 脚本的源代码所示(https://github.com/google-research/bert/blob/bee6030e31e42a9394ac567da170a89a98d2062f/modeling.py),pooled_output(通常由 get_pooled_output() 函数),从第一个标记的隐藏状态返回激活.

Since the source code of the TF Hub module is not available, and assuming that the TFHub would use the same implementation as the open-sourced version of the code by authors of BERT (https://github.com/google-research/bert/). As given by the source code of modeling.py script (https://github.com/google-research/bert/blob/bee6030e31e42a9394ac567da170a89a98d2062f/modeling.py), the pooled_output (often called by get_pooled_output() function), returns the activations from the hidden state of the 1st token.

这篇关于BERT - 池化输出与序列输出的第一个向量不同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆