嵌入pytorch [英] Embedding in pytorch

查看:23
本文介绍了嵌入pytorch的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 Stackoverflow 上查看了 PyTorch 教程和与此类似的问题.

I have checked the PyTorch tutorial and questions similar to this one on Stackoverflow.

我很困惑;pytorch 中的嵌入(Embedding)是否使相似的词更接近对彼此?我只需要给它所有的句子吗?或者它只是一个查找表,我需要对模型进行编码?

I get confused; does the embedding in pytorch (Embedding) make the similar words closer to each other? And do I just need to give to it all the sentences? Or it is just a lookup table and I need to code the model?

推荐答案

nn.Embedding 持有一个维度为 (vocab_size, vector_size) 的 Tensor,即大小为词汇表 x 每个向量嵌入的维度,以及进行查找的方法.

nn.Embedding holds a Tensor of dimension (vocab_size, vector_size), i.e. of the size of the vocabulary x the dimension of each vector embedding, and a method that does the lookup.

当您创建嵌入层时,张量会随机初始化.只有当你训练它时,相似词之间的这种相似性才会出现.除非您使用先前训练过的模型(例如 GloVe 或 Word2Vec)覆盖了嵌入的值,但那是另一回事了.

When you create an embedding layer, the Tensor is initialised randomly. It is only when you train it when this similarity between similar words should appear. Unless you have overwritten the values of the embedding with a previously trained model, like GloVe or Word2Vec, but that's another story.

因此,一旦您定义了嵌入层,定义并编码了词汇表(即为词汇表中的每个单词分配一个唯一编号),您就可以使用 nn.Embedding 类的实例来获取相应的嵌入.

So, once you have the embedding layer defined, and the vocabulary defined and encoded (i.e. assign a unique number to each word in the vocabulary) you can use the instance of the nn.Embedding class to get the corresponding embedding.

例如:

import torch
from torch import nn
embedding = nn.Embedding(1000,128)
embedding(torch.LongTensor([3,4]))

将返回对应于词汇表中单词 3 和 4 的嵌入向量.由于尚未训练任何模型,因此它们将是随机的.

will return the embedding vectors corresponding to the word 3 and 4 in your vocabulary. As no model has been trained, they will be random.

这篇关于嵌入pytorch的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆