在 Windows 上将 Word2vec 与 Tensorflow 结合使用 [英] Using Word2vec with Tensorflow on Windows

查看:49
本文介绍了在 Windows 上将 Word2vec 与 Tensorflow 结合使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

本教程文件中Tensorflow 找到下面这行(第 45 行)来加载 word2vec extension":

In this tutorial file by Tensorflow the following line is found (line 45) to load the word2vec "extension":

word2vec = tf.load_op_library(os.path.join(os.path.dirname(os.path.realpath(__file__)), 'word2vec_ops.so'))

我使用的是 Windows 10,正如这个问题中所指出的,.so - 文件适用于 Linux.

I am using Windows 10, and as also is pointed out in this SO question, .so-files are for Linux.

在 Windows 上加载的等效扩展是什么?

What is the equivalent extension to load on Windows?

另外,我不明白为什么在安装时 Tensorflow 中包含了这么多其他内容,但 Word2Vec 必须在本地构建.在文档在 Windows 上安装 TensorFlow 中,没有提到必须构建这些扩展.

Also, I don't understand why so much else is included in Tensorflow upon installation but Word2Vec has to be built locally. In documentation, Installing TensorFlow on Windows, there is no mention of having to build these extensions.

这是一种旧的做法,现在已经改变并且所有东西都随安装一起提供了吗?如果是这样,该更改如何应用于示例中的 word2vec 模块?

Was this an old practice that has now changed and everything is shipped with the installation? If so, how does that change apply to the word2vec module in the example?

推荐答案

是的,它已经改变了!Tensorflow 现在包含一个辅助函数 tf.nn.embedding_lookup,它使嵌入数据变得非常容易.

Yes, it has changed! Tensorflow now includes a helper function, tf.nn.embedding_lookup that makes it very easy to embed your data.

您可以通过执行诸如this之类的操作来使用它,即

You can use it by doing something like this, i.e.

embeddings = tf.Variable(
    tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))

nce_weights = tf.Variable(
  tf.truncated_normal([vocabulary_size, embedding_size],
                      stddev=1.0 / math.sqrt(embedding_size)))
nce_biases = tf.Variable(tf.zeros([vocabulary_size]))

# Placeholders for inputs
train_inputs = tf.placeholder(tf.int32, shape=[batch_size])
train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1])

embed = tf.nn.embedding_lookup(embeddings, train_inputs)

# Compute the NCE loss, using a sample of the negative labels each time.
loss = tf.reduce_mean(
  tf.nn.nce_loss(weights=nce_weights,
                 biases=nce_biases,
                 labels=train_labels,
                 inputs=embed,
                 num_sampled=num_sampled,
                 num_classes=vocabulary_size))
# We use the SGD optimizer.
optimizer = tf.train.GradientDescentOptimizer(learning_rate=1.0).minimize(loss)

for inputs, labels in generate_batch(...):
  feed_dict = {train_inputs: inputs, train_labels: labels}
  _, cur_loss = session.run([optimizer, loss], feed_dict=feed_dict)

完整代码在这里.

总的来说,我不会过多依赖 tensorflow/models 存储库.其中部分内容已经过时.主 tensorflow/tensorflow 存储库得到更好的维护.

In general, I would be hesitant to rely too much on the tensorflow/models repo. Parts of it are quite out of date. The main tensorflow/tensorflow repo is better maintained.

这篇关于在 Windows 上将 Word2vec 与 Tensorflow 结合使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆