PyTorch/Gensim-如何加载预训练的单词嵌入 [英] PyTorch / Gensim - How to load pre-trained word embeddings

查看:815
本文介绍了PyTorch/Gensim-如何加载预训练的单词嵌入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将使用gensim进行预训练的word2vec嵌入加载到PyTorch嵌入层中.

I want to load a pre-trained word2vec embedding with gensim into a PyTorch embedding layer.

所以我的问题是,如何让gensim将嵌入权重加载到PyTorch嵌入层中.

So my question is, how do I get the embedding weights loaded by gensim into the PyTorch embedding layer.

预先感谢!

推荐答案

我只想报告我关于使用pyTorch嵌入gensim嵌入的发现.

I just wanted to report my findings about loading a gensim embedding with PyTorch.

  • PyTorch 0.4.0及更高版本的解决方案:

  • Solution for PyTorch 0.4.0 and newer:

v0.4.0中有一个新功能 ,这使得加载嵌入非常舒适. 这是文档中的示例.

From v0.4.0 there is a new function from_pretrained() which makes loading an embedding very comfortable. Here is an example from the documentation.

import torch
import torch.nn as nn

# FloatTensor containing pretrained weights
weight = torch.FloatTensor([[1, 2.3, 3], [4, 5.1, 6.3]])
embedding = nn.Embedding.from_pretrained(weight)
# Get embeddings for index 1
input = torch.LongTensor([1])
embedding(input)

gensim 的权重可以通过以下方式轻松获得:

The weights from gensim can easily be obtained by:

import gensim
model = gensim.models.KeyedVectors.load_word2vec_format('path/to/file')
weights = torch.FloatTensor(model.vectors) # formerly syn0, which is soon deprecated

@Guglie指出:在较新的gensim版本中,权重可以通过 model.wv获得:

As noted by @Guglie: in newer gensim versions the weights can be obtained by model.wv:

weights = model.wv


  • PyTorch版本0.3.1及更低版本的解决方案:


    • Solution for PyTorch version 0.3.1 and older:

    • 我正在使用版本0.3.1 在此版本中不可用.

      I'm using version 0.3.1 and from_pretrained() isn't available in this version.

      因此,我创建了自己的from_pretrained,因此也可以将其与0.3.1一起使用.

      Therefore I created my own from_pretrained so I can also use it with 0.3.1.

      PyTorch版本0.3.1或更低版本的from_pretrained的代码:

      Code for from_pretrained for PyTorch versions 0.3.1 or lower:

      def from_pretrained(embeddings, freeze=True):
          assert embeddings.dim() == 2, \
               'Embeddings parameter is expected to be 2-dimensional'
          rows, cols = embeddings.shape
          embedding = torch.nn.Embedding(num_embeddings=rows, embedding_dim=cols)
          embedding.weight = torch.nn.Parameter(embeddings)
          embedding.weight.requires_grad = not freeze
          return embedding
      

      然后可以像下面这样加载嵌入:

      The embedding can be loaded then just like this:

      embedding = from_pretrained(weights)
      

      我希望这对某人有帮助.

      I hope this is helpful for someone.

      这篇关于PyTorch/Gensim-如何加载预训练的单词嵌入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆