PyTorch/Gensim-如何加载预训练的单词嵌入 [英] PyTorch / Gensim - How to load pre-trained word embeddings

查看：815 发布时间：2020/5/17 19:10:28 python neural-network pytorch gensim embedding

本文介绍了PyTorch/Gensim-如何加载预训练的单词嵌入的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想将使用gensim进行预训练的word2vec嵌入加载到PyTorch嵌入层中.

I want to load a pre-trained word2vec embedding with gensim into a PyTorch embedding layer.

所以我的问题是，如何让gensim将嵌入权重加载到PyTorch嵌入层中.

So my question is, how do I get the embedding weights loaded by gensim into the PyTorch embedding layer.

预先感谢！

推荐答案

我只想报告我关于使用pyTorch嵌入gensim嵌入的发现.

I just wanted to report my findings about loading a gensim embedding with PyTorch.

PyTorch 0.4.0及更高版本的解决方案:

Solution for PyTorch 0.4.0 and newer:

在v0.4.0中有一个新功能，这使得加载嵌入非常舒适. 这是文档中的示例.

From v0.4.0 there is a new function from_pretrained() which makes loading an embedding very comfortable. Here is an example from the documentation.

import torch
import torch.nn as nn

# FloatTensor containing pretrained weights
weight = torch.FloatTensor([[1, 2.3, 3], [4, 5.1, 6.3]])
embedding = nn.Embedding.from_pretrained(weight)
# Get embeddings for index 1
input = torch.LongTensor([1])
embedding(input)

gensim 的权重可以通过以下方式轻松获得:

The weights from gensim can easily be obtained by:

import gensim
model = gensim.models.KeyedVectors.load_word2vec_format('path/to/file')
weights = torch.FloatTensor(model.vectors) # formerly syn0, which is soon deprecated

@Guglie指出:在较新的gensim版本中，权重可以通过 model.wv获得:

As noted by @Guglie: in newer gensim versions the weights can be obtained by model.wv:

weights = model.wv

PyTorch版本0.3.1及更低版本的解决方案:

Solution for PyTorch version 0.3.1 and older:

我正在使用版本0.3.1和在此版本中不可用.

I'm using version 0.3.1 and from_pretrained() isn't available in this version.

因此，我创建了自己的from_pretrained，因此也可以将其与0.3.1一起使用.

Therefore I created my own from_pretrained so I can also use it with 0.3.1.

PyTorch版本0.3.1或更低版本的from_pretrained的代码:

Code for from_pretrained for PyTorch versions 0.3.1 or lower:

def from_pretrained(embeddings, freeze=True):
    assert embeddings.dim() == 2, \
         'Embeddings parameter is expected to be 2-dimensional'
    rows, cols = embeddings.shape
    embedding = torch.nn.Embedding(num_embeddings=rows, embedding_dim=cols)
    embedding.weight = torch.nn.Parameter(embeddings)
    embedding.weight.requires_grad = not freeze
    return embedding

然后可以像下面这样加载嵌入:

The embedding can be loaded then just like this:

embedding = from_pretrained(weights)

我希望这对某人有帮助.

I hope this is helpful for someone.

这篇关于PyTorch/Gensim-如何加载预训练的单词嵌入的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

PyTorch/Gensim-如何加载预训练的单词嵌入 [英] PyTorch / Gensim - How to load pre-trained word embeddings

问题描述

推荐答案

PyTorch `0.4.0`及更高版本的解决方案:

Solution for PyTorch `0.4.0` and newer:

PyTorch版本`0.3.1`及更低版本的解决方案:

Solution for PyTorch version `0.3.1` and older:

相关文章

Python最新文章

热门教程

热门工具

登录关闭

PyTorch/Gensim-如何加载预训练的单词嵌入 [英] PyTorch / Gensim - How to load pre-trained word embeddings

问题描述

推荐答案

PyTorch 0.4.0及更高版本的解决方案:

Solution for PyTorch 0.4.0 and newer:

PyTorch版本0.3.1及更低版本的解决方案:

Solution for PyTorch version 0.3.1 and older:

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

PyTorch `0.4.0`及更高版本的解决方案:

Solution for PyTorch `0.4.0` and newer:

PyTorch版本`0.3.1`及更低版本的解决方案:

Solution for PyTorch version `0.3.1` and older:

登录关闭