如何使用word2vec查找与向量最接近的词 [英] How to find the closest word to a vector using word2vec

查看:535
本文介绍了如何使用word2vec查找与向量最接近的词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚刚开始使用Word2vec,我想知道如何才能找到与向量最接近的词。
我有这个向量,它是一组向量的平均向量:

I have just started using Word2vec and I was wondering how can we find the closest word to a vector suppose. I have this vector which is the average vector for a set of vectors:

array([-0.00449447, -0.00310097, 0.02421786, ...], dtype=float32)

是否有直接找到的方法在训练数据中与该向量最相似的词是

Is there a straight forward way to find the most similar word in my training data to this vector?

还是唯一的解决方案是计算此向量与训练数据中每个词的向量之间的余弦相似度,然后选择最接近的一个?

Or the only solution is to calculate the cosine similarity between this vector and the vectors of each word in my training data, then select the closest one?

谢谢。

推荐答案

gensim word2vec的实现有 most_like()函数,可让您查找在语义上与给定单词接近的单词:

For gensim implementation of word2vec there is most_similar() function that lets you find words semantically close to a given word:

>>> model.most_similar(positive=['woman', 'king'], negative=['man'])
[('queen', 0.50882536), ...]

或其向量表示形式:

>>> your_word_vector = array([-0.00449447, -0.00310097, 0.02421786, ...], dtype=float32)
>>> model.most_similar(positive=[your_word_vector], topn=1))

其中 topn 定义所需的返回结果数。

where topn defines the desired number of returned results.

但是,我的直觉是该函数的功能与您建议的功能完全相同,即计算给定向量和字典中每个向量的余弦相似度(效率很低...)

However, my gut feeling is that function does exactly the same that you proposed, i.e. calculates cosine similarity for the given vector and each other vector in the dictionary (which is quite inefficient...)

这篇关于如何使用word2vec查找与向量最接近的词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆