二元向量 [英] Bigram to a vector

查看:330
本文介绍了二元向量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用word2vec工具为文档构造单词嵌入.我知道如何找到与单个单词(字母组合)相对应的向量嵌入.现在,我想找到一个二元组的向量.是否可以使用word2vec?如果是,怎么办?

I want to construct word embeddings for documents using word2vec tool. I know how to find a vector embedding corresponding to a single word(unigram). Now, I want to find a vector for a bigram. Is it possible to do using word2vec? If yes, how?

推荐答案

以下代码段将为您提供双字母组的矢量表示.请注意,要转换为向量的二元组需要下划线而不是单词之间的空格,例如bigram2vec(unigrams, "this report")是错误的,它应该是bigram2vec(unigrams, "this_report").有关生成美术字的更多详细信息,请参见gensim.models.word2vec.Word2Vec此处.

The following snippet will get you the vector representation of a bigram. Note that the bigram you want to convert to a vector needs to have an underscore instead of a space between the words, e.g. bigram2vec(unigrams, "this report") is wrong, it should be bigram2vec(unigrams, "this_report"). For more details on generating the unigrams, please see the gensim.models.word2vec.Word2Vec class here.

from gensim.models import word2vec

def bigram2vec(unigrams, bigram_to_search):
    bigrams = Phrases(unigrams)
    model = word2vec.Word2Vec(bigrams[unigrams])
    if bigram_to_search in model.vocab.keys():
        return model[bigram_to_search]
    else:
        return None

这篇关于二元向量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆