为什么我们要“打包"? pytorch中的序列? [英] why do we "pack" the sequences in pytorch?

查看:167
本文介绍了为什么我们要“打包"? pytorch中的序列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试复制

I was trying to replicate How to use packing for variable-length sequence inputs for rnn but I guess I first need to understand why we need to "pack" the sequence.

我了解为什么我们需要填充"它们,但是为什么需要打包"(通过pack_padded_sequence)呢?

I understand why we need to "pad" them but why is "packing" ( through pack_padded_sequence) necessary?

任何高级解释将不胜感激!

Any high-level explanation would be appreciated!

推荐答案

我也偶然发现了这个问题,下面是我所发现的问题.

I have stumbled upon this problem too and below is what I figured out.

训练RNN(LSTM或GRU或vanilla-RNN)时,很难对可变长度序列进行批处理.例如:如果大小为8的批次的序列长度为[4,6,8,5,4,3,7,8],则将填充所有序列,这将产生8个长度为8的序列.最终进行64次计算(8x8),但是您只需要进行45次计算.而且,如果您想做一些像使用双向RNN的事情,那么仅通过填充就很难进行批处理计算,并且最终可能会执行比所需的更多的计算.

When training RNN (LSTM or GRU or vanilla-RNN), it is difficult to batch the variable length sequences. For ex: if length of sequences in a size 8 batch is [4,6,8,5,4,3,7,8], you will pad all the sequences and that will results in 8 sequences of length 8. You would end up doing 64 computation (8x8), but you needed to do only 45 computations. Moreover, if you wanted to do something fancy like using a bidirectional-RNN it would be harder to do batch computations just by padding and you might end up doing more computations than required.

相反,pytorch允许我们打包序列,内部打包的序列是两个列表的元组.一个包含序列的元素.元素按时间步长交错(请参见下面的示例),其他元素包含每个序列的大小每个步骤的批处理大小.这有助于恢复实际序列,并告诉RNN每个时间步的批量大小是多少. @Aerin指出了这一点.可以将其传递给RNN,并将在内部优化计算.

Instead, pytorch allows us to pack the sequence, internally packed sequence is a tuple of two lists. One contains the elements of sequences. Elements are interleaved by time steps (see example below) and other contains the size of each sequence the batch size at each step. This is helpful in recovering the actual sequences as well as telling RNN what is the batch size at each time step. This has been pointed by @Aerin. This can be passed to RNN and it will internally optimize the computations.

我可能在某些时候不清楚,所以请告诉我,我可以添加更多的解释.

I might have been unclear at some points, so let me know and I can add more explanations.

下面是一个代码示例:

 a = [torch.tensor([1,2,3]), torch.tensor([3,4])]
 b = torch.nn.utils.rnn.pad_sequence(a, batch_first=True)
 >>>>
 tensor([[ 1,  2,  3],
    [ 3,  4,  0]])
 torch.nn.utils.rnn.pack_padded_sequence(b, batch_first=True, lengths=[3,2])
 >>>>PackedSequence(data=tensor([ 1,  3,  2,  4,  3]), batch_sizes=tensor([ 2,  2,  1]))

这篇关于为什么我们要“打包"? pytorch中的序列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆