张量流 word2vec 示例中权重和偏差的目的是什么? [英] What is the purpose of weights and biases in tensorflow word2vec example?

查看:9
本文介绍了张量流 word2vec 示例中权重和偏差的目的是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想了解 word2vec 示例 有效,但并不真正理解传递给 nse_loss 函数的权重和偏差的目的是什么.该函数有两个变量输入:权重(加上偏差)和嵌入.

I'm trying to understand how word2vec example works and don't really understand what is the purpose of weights and biases passed into nse_loss function. There are two variable inputs into the function: weights (plus biases) and embedding.

# Look up embeddings for inputs.
embeddings = tf.Variable(
    tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
embed = tf.nn.embedding_lookup(embeddings, train_inputs)

# Construct the variables for the NCE loss
nce_weights = tf.Variable(
    tf.truncated_normal([vocabulary_size, embedding_size],
                        stddev=1.0 / math.sqrt(embedding_size)))
nce_biases = tf.Variable(tf.zeros([vocabulary_size]))

两者都是随机初始化的,并且(据我所知)在学习过程中都会进行更新.

Both are randomly initialized and (as far as I understand) both are subject to updates during learning.

# Compute the average NCE loss for the batch.
loss = tf.reduce_mean(
  tf.nn.nce_loss(nce_weights, nce_biases, embed, train_labels,
                 num_sampled, vocabulary_size))

我想它们都应该代表训练有素的模型.然而,权重和偏差在以后永远不会用于相似性计算.相反,只使用了一个组件:

I suppose both of them should represent trained model. However weights and biases are never used later on for similarity calculations. Instead, only one component is used:

# Compute the cosine similarity between minibatch examples and all embeddings.
norm = tf.sqrt(tf.reduce_sum(tf.square(embeddings), 1, keep_dims=True))
normalized_embeddings = embeddings / norm
valid_embeddings = tf.nn.embedding_lookup(
  normalized_embeddings, valid_dataset)
similarity = tf.matmul(
  valid_embeddings, normalized_embeddings, transpose_b=True)

那么模型的第二个组件呢?为什么权重和偏差会被忽略?

So what about second component of the model? Why weighs and biases are being ignored?

谢谢.

推荐答案

在 word2vec 中,您需要的是单词的向量表示.为此,您可以使用神经网络等.所以你有输入神经元、输出和隐藏层.学习向量表示要做的是有一个隐藏层,该层的神经元数量与您想要的向量维度相同.每个单词有一个输入,每个单词有一个输出.然后你训练网络从输出中学习输入,但在中间你有一个较小的层,你可以将其视为向量中输入的编码.所以这里是权重和偏差.但是您以后不需要它们,您用于测试的是一个字典,其中包含单词和表示该单词的向量.这比运行神经网络来获得表示要快.这就是为什么你以后看不到它.

In word2vec what you want is a vector representation of words. In order to do that you can use, among other things, a neural network. So you have inputs neurons, outputs and hidden layers. What you do to learn the vector representation is to have a hidden layer which number of neurons is the same as the dimension you want in your vectors. There is one input per word and one output per word. And then you train the network to learn the input from the output but in the middle you have a smaller layer which you can see it as a codification of the input in a vector. So here are the weights and biases. But you don't need them later, what you use for testing is a dictionary which contains the word and the vector which represents that word. This is faster than running the neural network to get the representation. That is why you don't see it later.

你写的关于余弦距离的最后一段代码是为了知道哪些向量与你计算出的向量接近.你有一些词(向量)你做了一些操作(比如:国王 - 男人+女人)然后你有一个要在结果中转换的向量.这是在所有向量中运行的余弦函数(皇后与操作的结果向量的距离最小).

The last code you write about the cosine distance is to know which vectors are closed to your calculated vector. You have some words (vectors) you make some operations (like: king - man + woman) and then you have a vector that you want to convert in the result. This is the cosine function run among all the vectors (queen would have the minimum distance with the result vector of the operation).

总而言之,您在验证阶段看不到权重和偏差,因为您不需要它们.您使用您在培训中创建的词典.

To sum up, you don't see the weight and bias in the validation phase because you don't need them. You use the dictionary you have created in the training.

UPDATE s0urcer 更好地解释了如何创建向量表示.

UPDATE s0urcer has explained better how the vector representation is created.

网络的输入层和输出层代表单词.这意味着如果单词不存在,则值为 0,如果单词存在,则值为 1.第一个位置是一个词,第二个是另一个词,等等.你的输入/输出神经元和词一样.

The input layer and the output layer of the networks represents words. It means the value is 0 if the word is not there and 1 if the word is there. First position is one word, second another one, etc. You have as input/output neurons as words.

中间层是上下文,或者是单词的向量表示.

The middle layer is the context, or you vector representation of the words.

现在您使用句子或一组连续单词训练网络.从这一组中,您取一个词并将其设置在输入中,其他词是网络的输出.所以基本上网络会学习一个词与它的上下文中的其他词之间的关系.

Now you train the network with sentences or group of consecutive words. From this group you take one word and set it in the inputs and the other words are the outputs of the network. So basically the network learns how a word is related with other words in its context.

要获得每个词的向量表示,您将该词的输入神经元设置为 1,并查看上下文层(中间层)的值.这些值是向量的值.由于除了单词为 1 之外的所有输入均为 0,因此这些值是输入神经元与上下文的连接权重.

To get the vector representation of each word you set the input neuron of that word to 1 and see the values of the context layer (the middle layer). Those values are the values of the vector. As all the inputs are 0 except the word that is 1, those values are the weights of the connections of the input neuron with the context.

以后不用网络,因为不需要计算上下文层的所有值,那样会比较慢.你只需要在你的字典里查一下这个词的那些值.

You don't use the network later because you don't need to calculate all the values of the context layer, that will be slower. You only need to check in your dictionary what are those values for the word.

这篇关于张量流 word2vec 示例中权重和偏差的目的是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆