嵌入层和密集层有什么区别? [英] What is the difference between an Embedding Layer and a Dense Layer?

查看:300
本文介绍了嵌入层和密集层有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Keras中嵌入层的文档说:

The docs for an Embedding Layer in Keras say:

将正整数(索引)转换为固定大小的密集向量.例如. [[4], [20]]-> [[0.25, 0.1], [0.6, -0.2]]

我相信,也可以通过将输入编码为长度为vocabulary_size的单热点矢量并将其输入到密集层.

I believe this could also be achieved by encoding the inputs as one-hot vectors of length vocabulary_size, and feeding them into a Dense Layer.

嵌入层仅仅是此两步过程的便利,还是在幕后进行了一些幻想?

Is an Embedding Layer merely a convenience for this two-step process, or is something fancier going on under the hood?

推荐答案

从数学上来说,差异是这样的:

Mathematically, the difference is this:

  • 嵌入层执行 select 操作.在喀拉拉邦,此层等效于:

  • An embedding layer performs select operation. In keras, this layer is equivalent to:

K.gather(self.embeddings, inputs)      # just one matrix

  • 一个密集层执行 dot-product 操作,并进行可选的激活:

  • A dense layer performs dot-product operation, plus an optional activation:

    outputs = matmul(inputs, self.kernel)  # a kernel matrix
    outputs = bias_add(outputs, self.bias) # a bias vector
    return self.activation(outputs)        # an activation function
    

  • 您可以通过一键编码" 仿真具有完全连接层的嵌入层,但是密集嵌入的全部目的是避免一键表示.在NLP中,单词词汇量的大小约为100k(有时甚至是一百万).最重要的是,通常需要批量处理单词序列.处理单词索引序列的批处理将比单热点矢量序列的批处理效率高得多.此外,gather操作本身在向前和向后传递中都比矩阵点积更快.

    You can emulate an embedding layer with fully-connected layer via one-hot encoding, but the whole point of dense embedding is to avoid one-hot representation. In NLP, the word vocabulary size can be of the order 100k (sometimes even a million). On top of that, it's often needed to process the sequences of words in a batch. Processing the batch of sequences of word indices would be much more efficient than the batch of sequences of one-hot vectors. In addition, gather operation itself is faster than matrix dot-product, both in forward and backward pass.

    这篇关于嵌入层和密集层有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆