如何反转 PyTorch 嵌入? [英] How to invert a PyTorch Embedding?

查看:45
本文介绍了如何反转 PyTorch 嵌入?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 PyTorch 中有一个多任务编码器/解码器模型,输入端有一个(可训练的)torch.nn.Embedding 嵌入层.

I have an multi-task encoder/decoder model in PyTorch with a (trainable) torch.nn.Embedding embedding layer at the input.

在一项特定任务中,我想对模型进行自我监督预训练(以重新构建屏蔽输入数据)并将其用于推理(以填补数据空白).

In one particular task, I'd like to pre-train the model self-supervised (to re-construct masked input data) and use it for inference (to fill in gaps in data).

我想对于训练时间,我可以将损失作为输入嵌入和输出嵌入之间的距离来衡量……但是对于推理,我如何反转 Embedding 以重建正确的类别/标记输出对应于?我看不到例如最近的"Embedding 类上的函数...

I guess for training time I can just measure loss as the distance between the input embedding and the output embedding... But for inference, how do I invert an Embedding to reconstruct the proper category/token the output corresponds to? I can't see e.g. a "nearest" function on the Embedding class...

推荐答案

你可以很容易地做到:

import torch

embeddings = torch.nn.Embedding(1000, 100)
my_sample = torch.randn(1, 100)
distance = torch.norm(embeddings.weight.data - my_sample, dim=1)
nearest = torch.argmin(distance)

假设您有 1000 个具有 100 维的标记,这将返回基于欧几里德距离的最近嵌入.您还可以以类似方式使用其他指标.

Assuming you have 1000 tokens with 100 dimensionality this would return nearest embedding based on euclidean distance. You could also use other metrics in similar manner.

这篇关于如何反转 PyTorch 嵌入?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆