如何在Tensorflow中使用预训练的Word2Vec模型 [英] How to use pretrained Word2Vec model in Tensorflow
问题描述
我有一个Word2Vec
模型,该模型在Gensim
中进行了训练.如何在Tensorflow
中为Word Embeddings
使用它.我不想从头开始在Tensorflow中训练嵌入.有人可以告诉我如何使用一些示例代码吗?
I have a Word2Vec
model which is trained in Gensim
. How can I use it in Tensorflow
for Word Embeddings
. I don't want to train Embeddings from scratch in Tensorflow. Can someone tell me how to do it with some example code?
推荐答案
让我们假设您有一个字典和inverse_dict列表,列表中的索引对应于最常见的单词:
Let's assume you have a dictionary and inverse_dict list, with index in list corresponding to most common words:
vocab = {'hello': 0, 'world': 2, 'neural':1, 'networks':3}
inv_dict = ['hello', 'neural', 'world', 'networks']
请注意inverse_dict索引如何与字典值相对应.现在声明您的嵌入矩阵并获取值:
Notice how the inverse_dict index corresponds to the dictionary values. Now declare your embedding matrix and get the values:
vocab_size = len(inv_dict)
emb_size = 300 # or whatever the size of your embeddings
embeddings = np.zeroes((vocab_size, emb_size))
from gensim.models.keyedvectors import KeyedVectors
model = KeyedVectors.load_word2vec_format('embeddings_file', binary=True)
for k, v in vocab.items():
embeddings[v] = model[k]
您已经有了嵌入矩阵.好的.现在,假设您要对样本进行训练:x = ['hello', 'world']
.但这不适用于我们的神经网络.我们需要整数:
You've got your embeddings matrix. Good. Now let's assume you want to train on the sample: x = ['hello', 'world']
. But this doesn't work for our neural net. We need to integerize:
x_train = []
for word in x:
x_train.append(vocab[word]) # integerize
x_train = np.array(x_train) # make into numpy array
现在,我们可以即时将样本嵌入
Now we are good to go with embedding our samples on-the-fly
x_model = tf.placeholder(tf.int32, shape=[None, input_size])
with tf.device("/cpu:0"):
embedded_x = tf.nn.embedding_lookup(embeddings, x_model)
现在embedded_x
进入您的卷积或其他任何卷积.我还假设您不是在重新训练嵌入,而是简单地使用它们.希望对您有帮助
Now embedded_x
goes into your convolution or whatever. I am also assuming you are not retraining the embeddings, but simply using them. Hope that helps
这篇关于如何在Tensorflow中使用预训练的Word2Vec模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!