张量流word2vec示例中的权重和偏差的目的是什么? [英] What is the purpose of weights and biases in tensorflow word2vec example?

查看:96
本文介绍了张量流word2vec示例中的权重和偏差的目的是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试了解 word2vec示例可以正常工作,并且不太了解传递给nse_loss函数的权重和偏差的目的是什么.该函数有两个变量输入:权重(加上偏差)和嵌入.

I'm trying to understand how word2vec example works and don't really understand what is the purpose of weights and biases passed into nse_loss function. There are two variable inputs into the function: weights (plus biases) and embedding.

# Look up embeddings for inputs.
embeddings = tf.Variable(
    tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
embed = tf.nn.embedding_lookup(embeddings, train_inputs)

# Construct the variables for the NCE loss
nce_weights = tf.Variable(
    tf.truncated_normal([vocabulary_size, embedding_size],
                        stddev=1.0 / math.sqrt(embedding_size)))
nce_biases = tf.Variable(tf.zeros([vocabulary_size]))

两者都是随机初始化的,并且(据我所知)两者在学习过程中都会进行更新.

Both are randomly initialized and (as far as I understand) both are subject to updates during learning.

# Compute the average NCE loss for the batch.
loss = tf.reduce_mean(
  tf.nn.nce_loss(nce_weights, nce_biases, embed, train_labels,
                 num_sampled, vocabulary_size))

我想他们两个都应该代表训练有素的模型.但是,以后再也不会将权重和偏差用于相似度计算.相反,仅使用一个组件:

I suppose both of them should represent trained model. However weights and biases are never used later on for similarity calculations. Instead, only one component is used:

# Compute the cosine similarity between minibatch examples and all embeddings.
norm = tf.sqrt(tf.reduce_sum(tf.square(embeddings), 1, keep_dims=True))
normalized_embeddings = embeddings / norm
valid_embeddings = tf.nn.embedding_lookup(
  normalized_embeddings, valid_dataset)
similarity = tf.matmul(
  valid_embeddings, normalized_embeddings, transpose_b=True)

那该模型的第二部分呢?为什么体重和偏见被忽略?

So what about second component of the model? Why weighs and biases are being ignored?

谢谢.

推荐答案

在word2vec中,您想要的是单词的矢量表示形式.为此,您可以使用神经网络.因此,您具有输入神经元,输出和隐藏层.要学习矢量表示,您需要做的是有一个隐藏层,其神经元数量与您想要的矢量维数相同.每个单词有一个输入,每个单词有一个输出.然后,您训练网络以从输出中学习输入,但是在中间,您有一个较小的层,您可以将其视为向量中输入的编码.因此,这里有权重和偏见.但是您以后不需要它们了,用于测试的是一个字典,其中包含单词和代表该单词的向量.这比运行神经网络获取表示要快.这就是为什么您以后看不到它的原因.

In word2vec what you want is a vector representation of words. In order to do that you can use, among other things, a neural network. So you have inputs neurons, outputs and hidden layers. What you do to learn the vector representation is to have a hidden layer which number of neurons is the same as the dimension you want in your vectors. There is one input per word and one output per word. And then you train the network to learn the input from the output but in the middle you have a smaller layer which you can see it as a codification of the input in a vector. So here are the weights and biases. But you don't need them later, what you use for testing is a dictionary which contains the word and the vector which represents that word. This is faster than running the neural network to get the representation. That is why you don't see it later.

最后编写的关于余弦距离的代码是要知道哪些向量与您计算出的向量接近.您有一些单词(向量),您要进行一些操作(例如:国王-男人+女人),然后有一个向量要转换为结果.这是在所有向量之间运行的余弦函数(queen与操作的结果向量之间的距离最小).

The last code you write about the cosine distance is to know which vectors are closed to your calculated vector. You have some words (vectors) you make some operations (like: king - man + woman) and then you have a vector that you want to convert in the result. This is the cosine function run among all the vectors (queen would have the minimum distance with the result vector of the operation).

总而言之,在验证阶段您不会看到权重和偏见,因为您不需要它们.您将使用在培训中创建的词典.

To sum up, you don't see the weight and bias in the validation phase because you don't need them. You use the dictionary you have created in the training.

更新 s0urcer更好地解释了矢量表示的创建方式.

UPDATE s0urcer has explained better how the vector representation is created.

网络的输入层和输出层代表单词.这意味着如果单词不存在,则值为0;如果单词不存在,则值为1.第一个位置是一个单词,第二个位置是另一个单词,依此类推.您的输入/输出神经元与单词一样.

The input layer and the output layer of the networks represents words. It means the value is 0 if the word is not there and 1 if the word is there. First position is one word, second another one, etc. You have as input/output neurons as words.

中间层是上下文,或者您可以向量表示单词.

The middle layer is the context, or you vector representation of the words.

现在,您可以使用句子或一组连续单词来训练网络.从这个组中,您选择一个单词并将其设置在输入中,其他单词则是网络的输出.因此,基本上,网络会学习单词在上下文中与其他单词之间的关系.

Now you train the network with sentences or group of consecutive words. From this group you take one word and set it in the inputs and the other words are the outputs of the network. So basically the network learns how a word is related with other words in its context.

要获取每个单词的向量表示,请将该单词的输入神经元设置为1,然后查看上下文层(中间层)的值.这些值是向量的值.由于除单词1以外的所有输入均为0,因此这些值就是输入神经元与上下文的连接权重.

To get the vector representation of each word you set the input neuron of that word to 1 and see the values of the context layer (the middle layer). Those values are the values of the vector. As all the inputs are 0 except the word that is 1, those values are the weights of the connections of the input neuron with the context.

您以后不再使用网络,因为您不需要计算上下文层的所有值,这会比较慢.您只需要在字典中检查该单词的值是什么.

You don't use the network later because you don't need to calculate all the values of the context layer, that will be slower. You only need to check in your dictionary what are those values for the word.

这篇关于张量流word2vec示例中的权重和偏差的目的是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆