尝试使用Tensorflow理解用于NLP的CNN教程 [英] Trying to understand CNNs for NLP tutorial using Tensorflow

查看:127
本文介绍了尝试使用Tensorflow理解用于NLP的CNN教程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在关注此教程,以了解NLP中的CNN.尽管有我面前的代码,但有些事情我还是不明白.我希望有人可以在这里清除一些内容.

I am following this tutorial in order to understand CNNs in NLP. There are a few things which I don't understand despite having the code in front of me. I hope somebody can clear a few things up here.

首要的事情是TextCNN对象的sequence_length参数.在github上的示例中,这只是56,我认为这是训练数据中所有句子的最大长度.这意味着self.input_x是一个56维向量,其中将仅包含每个单词的句子字典中的索引.

The first rather minor thing is the sequence_lengthparameter of the TextCNN object. In the example on github this is just 56 which I think is the max-length of all sentences in the training data. This means that self.input_x is a 56-dimensional vector which will contain just the indices from the dictionary of a sentence for each word.

此列表进入tf.nn.embedding_lookup(W, self.intput_x),它将返回一个矩阵,该矩阵由self.input_x给出的那些单词的单词嵌入组成.根据此答案,此操作类似于对numpy使用索引:

This list goes into tf.nn.embedding_lookup(W, self.intput_x) which will return a matrix consisting of the word embeddings of those words given by self.input_x. According to this answer this operation is similar to using indexing with numpy:

matrix = np.random.random([1024, 64]) 
ids = np.array([0, 5, 17, 33])
print matrix[ids]

但是这里的问题是self.input_x大多数时候看起来像[1 3 44 25 64 0 0 0 0 0 0 0 .. 0 0].如果我假设tf.nn.embedding_lookup忽略值0,我是否正确?

But the problem here is that self.input_x most of the time looks like [1 3 44 25 64 0 0 0 0 0 0 0 .. 0 0]. So am I correct if I assume that tf.nn.embedding_lookup ignores the value 0?

我不明白的另一件事是tf.nn.embedding_lookup在这里的工作方式:

Another thing I don't get is how tf.nn.embedding_lookup is working here:

# Embedding layer
with tf.device('/cpu:0'), tf.name_scope("embedding"):
    W = tf.Variable(
        tf.random_uniform([vocab_size, embedding_size], -1.0, 1.0),
            name="W")
    self.embedded_chars = tf.nn.embedding_lookup(W, self.input_x)
    self.embedded_chars_expanded = tf.expand_dims(self.embedded_chars, -1)

我假设taht self.embedded_chars是输入到CNN的 actual 的矩阵,其中每一行代表一个单词的单词嵌入.但是tf.nn.embedding_lookup如何知道self.input_x给定的那些索引?

I assume, taht self.embedded_chars is the matrix which is the actual input to the CNN where each row represents the word embedding of one word. But how can tf.nn.embedding_lookup know about those indices given by self.input_x?

我在这里不明白的最后一件事是

The last thing which I don't understand here is

W是我们在训练期间学习的嵌入矩阵.我们使用随机均匀分布对其进行初始化. tf.nn.embedding_lookup创建实际的嵌入操作.嵌入操作的结果是形状为[None, sequence_length, embedding_size]的3维张量.

W is our embedding matrix that we learn during training. We initialize it using a random uniform distribution. tf.nn.embedding_lookup creates the actual embedding operation. The result of the embedding operation is a 3-dimensional tensor of shape [None, sequence_length, embedding_size].

这是否意味着我们实际上此处学习词嵌入?本教程在开始时指出:

Does this mean that we are actually learning the word embeddings here? The tutorial states at the beginning:

我们不会将预训练的word2vec向量用于我们的词嵌入.相反,我们从头开始学习嵌入.

We will not used pre-trained word2vec vectors for our word embeddings. Instead, we learn embeddings from scratch.

但是我看不到这实际发生的代码行. 嵌入层的代码看起来好像没有什么在训练或学习的-发生在哪里?

But I don't see a line of code where this is actually happening. The code of the embedding layer does not look like as if there is anything being trained or learned - so where is it happening?

推荐答案

对问题1的答案(如果我假设tf.nn.embedding_lookup忽略值0,那么我正确吗?):

Answer to ques 1 (So am I correct if I assume that tf.nn.embedding_lookup ignores the value 0?) :

输入向量中的0是词汇表中第0个符号的索引,该符号是PAD符号.我认为执行查找时不会忽略它.将返回嵌入矩阵的第0行.

The 0's in the input vector is the index to 0th symbol in the vocabulary, which is the PAD symbol. I don't think it gets ignored when the lookup is performed. 0th row of the embedding matrix will be returned.

回答问题2(但是tf.nn.embedding_lookup如何知道self.input_x给定的那些索引?):

Answer to ques 2 (But how can tf.nn.embedding_lookup know about those indices given by self.input_x?) :

嵌入矩阵的大小为[V * E],其中词汇的大小,E为嵌入向量的大小.矩阵的第0行是词汇第0个元素的嵌入向量,矩阵的第1行是词汇第1个元素的嵌入向量. 从输入向量x中,我们获得词汇中的单词索引,这些索引用于索引嵌入矩阵.

Size of the embedding matrix is [V * E] where is the size of vocabulary and E is dimension of embedding vector. 0th row of matrix is embedding vector for 0th element of vocabulary, 1st row of matrix is embedding vector for 1st element of vocabulary. From the input vector x, we get the indices of words in vocabulary, which are used for indexing the embedding matrix.

回答问题3(这是否意味着我们实际上在这里学习词嵌入?).

Answer to ques 3 (Does this mean that we are actually learning the word embeddings here?).

是的,我们实际上正在学习嵌入矩阵. 在嵌入层中,第W = tf.Variable( tf.random_uniform([vocab_size, embedding_size], -1.0, 1.0),name="W")行中的W是嵌入矩阵,默认情况下,在tensorflow trainable=TRUE中为变量.因此,W也将是一个学习的参数.要使用预先训练的模型,请设置trainable = False.

Yes, we are actually learning the embedding matrix. In the embedding layer, in line W = tf.Variable( tf.random_uniform([vocab_size, embedding_size], -1.0, 1.0),name="W") W is the embedding matrix and by default, in tensorflow trainable=TRUE for variable. So, W will also be a learned parameter. To use pre- trained model, set trainable = False.

有关代码的详细说明,请访问博客: https://agarnitin86.github.io/blog/2016/12/23/text-classification-cnn

For detailed explanation of the code you can follow blog: https://agarnitin86.github.io/blog/2016/12/23/text-classification-cnn

这篇关于尝试使用Tensorflow理解用于NLP的CNN教程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆