创建一个结合了其他模型中的单词的单词向量模型 [英] Creating a wordvector model combining words from other models

查看:89
本文介绍了创建一个结合了其他模型中的单词的单词向量模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个使用word2vec算法创建的不同的单词向量模型.现在我面临的问题是第二个模型中没有第一个模型中的几个词.我想从两个不同的单词向量模型中创建第三个模型,在这里我可以使用两个模型中的单词向量,而不会失去单词向量的含义和上下文.

I have two different word vector models created using word2vec algorithm . Now issue i am facing is few words from first model is not there in second model . I want to create a third model from two different word vectors models where i can use word vectors from both models without loosing meaning and the context of word vectors.

我可以这样做吗?如果可以,怎么办?

Can I do this, and if so, how?

推荐答案

您可以使用其他共享单词来学习翻译功能,从而仅将一个模型中单词的向量转换为另一模型的坐标空间.

You could potentially translate the vectors for the words only in one model to the other model's coordinate space, using other shared words to learn a translation-function.

在最新的gensim版本中可以执行此操作-请参见 TranslationMatrix 工具. docs/notebooks目录中包含一个演示版Jupyter笔记本,可在以下位置在线查看:

There's a facility to do this in recent gensim versions – see the TranslationMatrix tool. There's a demo Jupyter notebook included in the docs/notebooks directory, viewable online at:

https://github.com/RaRe -Technologies/gensim/blob/develop/docs/notebooks/translation_matrix.ipynb

您可能会采用更大模型(或者认为是更好的那个模型,也许是因为它接受了更多的数据训练),并转换了较小的数字它的单词在其空间中消失了.您将使用尽可能多的通用参考锚定"词.

You'd presumably take the larger model (or whichever one is thought to be better, perhaps because it was trained on more data), and translate the smaller number of words its missing into its space. You'd use as many common-reference 'anchor' words as is practical.

这篇关于创建一个结合了其他模型中的单词的单词向量模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆