在Pytorch中嵌入3D数据 [英] Embedding 3D data in Pytorch

查看：229 发布时间：2020/5/18 0:52:26 nlp pytorch

本文介绍了在Pytorch中嵌入3D数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想实现字符级嵌入.

I want to implement character-level embedding.

这是通常的单词嵌入.

词嵌入

Input: [ [‘who’, ‘is’, ‘this’] ] 
-> [ [3, 8, 2] ]     # (batch_size, sentence_len)
-> // Embedding(Input)
 # (batch_size, seq_len, embedding_dim)

这就是我想要做的.

字符嵌入

Input: [ [ [‘w’, ‘h’, ‘o’, 0], [‘i’, ‘s’, 0, 0], [‘t’, ‘h’, ‘i’, ‘s’] ] ]
-> [ [ [2, 3, 9, 0], [ 11, 4, 0, 0], [21, 10, 8, 9] ] ]      # (batch_size, sentence_len, word_len)
-> // Embedding(Input) # (batch_size, sentence_len, word_len, embedding_dim)
-> // sum each character embeddings  # (batch_size, sentence_len, embedding_dim)
The final output shape is same as Word embedding. Because I want to concat them later.

尽管我尝试过，但是我不确定如何实现3-D嵌入.你知道如何实现这样的数据吗?

Although I tried it, I am not sure how to implement 3-D embedding. Do you know how to implement such a data?

def forward(self, x):
    print('x', x.size()) # (N, seq_len, word_len)
    bs = x.size(0)
    seq_len = x.size(1)
    word_len = x.size(2)
    embd_list = []
    for i, elm in enumerate(x):
        tmp = torch.zeros(1, word_len, self.embd_size)
        for chars in elm:
            tmp = torch.add(tmp, 1.0, self.embedding(chars.unsqueeze(0)))

由于self.embedding的输出为Variable，因此上述代码出错.

Above code got an error because output of self.embedding is Variable.

TypeError: torch.add received an invalid combination of arguments - got (torch.FloatTensor, float, Variable), but expected one of:
 * (torch.FloatTensor source, float value)
 * (torch.FloatTensor source, torch.FloatTensor other)
 * (torch.FloatTensor source, torch.SparseFloatTensor other)
 * (torch.FloatTensor source, float value, torch.FloatTensor other)
      didn't match because some of the arguments have invalid types: (torch.FloatTensor, float, Variable)
 * (torch.FloatTensor source, float value, torch.SparseFloatTensor other)
      didn't match because some of the arguments have invalid types: (torch.FloatTensor, float, Variable)

更新

我可以做到.但是for对于批处理无效.你们知道更有效的方法吗?

Update

I could do this. But for is not effective for batch. Do you guys know more efficient way?

def forward(self, x):
    print('x', x.size()) # (N, seq_len, word_len)
    bs = x.size(0)
    seq_len = x.size(1)
    word_len = x.size(2)
    embd = Variable(torch.zeros(bs, seq_len, self.embd_size))
    for i, elm in enumerate(x): # every sample
        for j, chars in enumerate(elm): # every sentence. [ [‘w’, ‘h’, ‘o’, 0], [‘i’, ‘s’, 0, 0], [‘t’, ‘h’, ‘i’, ‘s’] ]
            chars_embd = self.embedding(chars.unsqueeze(0)) # (N, word_len, embd_size) [‘w’,‘h’,‘o’,0]
            chars_embd = torch.sum(chars_embd, 1) # (N, embd_size). sum each char's embedding
            embd[i,j] = chars_embd[0] # set char_embd as word-like embedding

    x = embd # (N, seq_len, embd_dim)

Update2

这是我的最终代码.谢谢你，瓦西·艾哈迈德(Wasi Ahmad)！

Update2

This is my final code. Thank you, Wasi Ahmad!

def forward(self, x):
    # x: (N, seq_len, word_len)
    input_shape = x.size()
    bs = x.size(0)
    seq_len = x.size(1)
    word_len = x.size(2)
    x = x.view(-1, word_len) # (N*seq_len, word_len)
    x = self.embedding(x) # (N*seq_len, word_len, embd_size)
    x = x.view(*input_shape, -1) # (N, seq_len, word_len, embd_size)
    x = x.sum(2) # (N, seq_len, embd_size)

    return x

推荐答案

我假设您具有形状为BxSxW的3d张量，其中:

I am assuming you have a 3d tensor of shape BxSxW where:

B = Batch size
S = Sentence length
W = Word length

您已经声明了嵌入层，如下所示.

And you have declared embedding layer as follows.

self.embedding = nn.Embedding(dict_size, emsize)

位置:

dict_size = No. of unique characters in the training corpus
emsize = Expected size of embeddings

因此，现在您需要将形状为BxSxW的3d张量转换为形状为BSxW的2d张量并将其提供给嵌入层.

So, now you need to convert the 3d tensor of shape BxSxW to a 2d tensor of shape BSxW and give it to the embedding layer.

emb = self.embedding(input_rep.view(-1, input_rep.size(2)))

emb的形状将为BSxWxE，其中E是嵌入尺寸.您可以按以下方式将生成的3d张量转换为4d张量.

The shape of emb will be BSxWxE where E is the embedding size. You can convert the resulting 3d tensor to a 4d tensor as follows.

emb = emb.view(*input_rep.size(), -1)

emb的最终形状将是BxSxWxE，这正是您所期望的.

The final shape of emb will be BxSxWxE which is what you are expecting.

这篇关于在Pytorch中嵌入3D数据的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在Pytorch中嵌入3D数据 [英] Embedding 3D data in Pytorch

问题描述

更新

Update

Update2

Update2

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在Pytorch中嵌入3D数据 [英] Embedding 3D data in Pytorch

问题描述

更新

Update

Update2

Update2

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭