网络的嵌入层是什么样的? [英] What does the embedding layer for a network looks like?
问题描述
我只是从文本分类开始,然后陷入了嵌入层.如果我有一批编码为与每个单词对应的整数的序列,那么嵌入层是什么样的?是否有像正常神经层一样的神经元?
I just start with text classification, and I got stuck in the embedding layer. If I have a batch of sequences encoded as integer corresponding to each word, what does the embedding layer looks like? Is there neurons like normal neural layer?
我看过keras.layers.Embedding
,但是在查找文档后,我对它的工作原理感到非常困惑.我可以理解input_dim
,但是为什么output_dim
是2D矩阵?这个嵌入层中有多少个权重?
I've seen the keras.layers.Embedding
, but after looking for the document I'm really confused about how does it works. I can understand input_dim
, but why is output_dim
a 2D matrix? How many weights do I have in this embedding layer?
很抱歉,如果我的问题不清楚,我没有NLP的经验,如果有关单词嵌入的问题是NLP的基本知识,请告诉我,我会进行检查.
I'm sorry if my question is not explained clearly, I've no experience in NLP, if this problem about word embedding is common basics in NLP, please tell me and I will check for it.
推荐答案
嵌入层只是一个可训练的查找表:它将整数索引作为输入,并将与该索引关联的词嵌入作为输出返回:
Embedding layer is just a trainable look-up table: it takes as input an integer index and returns as output the word embedding associated with that index:
index | word embeddings
=============================================================================
0 | word embedding for the word with index 0 (usually used for padding)
-----------------------------------------------------------------------------
1 | word embedding for the word with index 1
-----------------------------------------------------------------------------
2 | word embedding for the word with index 2
-----------------------------------------------------------------------------
. |
. |
. |
-----------------------------------------------------------------------------
N | word embedding for the word with index N
-----------------------------------------------------------------------------
从某种意义上讲,它是可训练的,嵌入值不一定固定,可以在训练过程中更改. input_dim
参数实际上是单词的数量(或更一般地说,是序列中不同元素的数量). output_dim
参数指定每个单词嵌入的维数.例如,在使用output_dim=100
的情况下,每个单词嵌入将是大小为100的向量.此外,由于嵌入层的输入是整数序列(对应于句子中的单词),因此其输出将具有形状(num_sequences, len_sequence, output_dim)
的值,即对于序列中的每个整数,返回大小为output_dim
的嵌入向量.
It is trainable in that sense the embeddings values are not necessarily fixed and could be changed during training. The input_dim
argument is actually the number of words (or more generally the number of distinct elements in the sequences). The output_dim
argument specifies the dimension of each word embedding. For example in case of using output_dim=100
each word embedding would be a vector of size 100. Further, since the input of an embedding layer is a sequence of integers (corresponding to the words in a sentence) therefore its output would have a shape of (num_sequences, len_sequence, output_dim)
, i.e. for each integer in a sequence an embedding vector of size output_dim
is returned.
对于嵌入层中的权重数量,非常容易计算:存在input_dim
个唯一索引,每个索引与大小为output_dim
的词嵌入相关联.因此,嵌入层中的权重数为input_dim x ouput_dim
.
As for the number of weights in an embedding layer it is very easy to calculate: there are input_dim
unique indices and each index is associated with a word embedding of size output_dim
. Therefore the number of weights in an embedding layer is input_dim x ouput_dim
.
这篇关于网络的嵌入层是什么样的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!