我们为什么要“打包"?PyTorch 中的序列? [英] Why do we "pack" the sequences in PyTorch?

查看:19
本文介绍了我们为什么要“打包"?PyTorch 中的序列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图复制 如何为 rnn 的可变长度序列输入使用打包 但我想我首先需要了解为什么我们需要打包"顺序.

I was trying to replicate How to use packing for variable-length sequence inputs for rnn but I guess I first need to understand why we need to "pack" the sequence.

我明白为什么我们需要填充"他们,但为什么是包装"?(通过pack_padded_sequence)有必要吗?

I understand why we need to "pad" them but why is "packing" (through pack_padded_sequence) necessary?

任何高级解释将不胜感激!

Any high-level explanation would be appreciated!

推荐答案

我也偶然发现了这个问题,下面是我想出来的.

I have stumbled upon this problem too and below is what I figured out.

在训练 RNN(LSTM 或 GRU 或 vanilla-RNN)时,很难对可变长度序列进行批处理.例如:如果大小为 8 的批次中的序列长度为 [4,6,8,5,4,3,7,8],您将填充所有序列,这将导致长度为 8 的 8 个序列.您最终会进行 64 次计算 (8x8),但您只需要进行 45 次计算.此外,如果您想做一些奇特的事情,例如使用双向 RNN,仅通过填充进行批量计算会更困难,并且您最终可能会进行比所需更多的计算.

When training RNN (LSTM or GRU or vanilla-RNN), it is difficult to batch the variable length sequences. For example: if the length of sequences in a size 8 batch is [4,6,8,5,4,3,7,8], you will pad all the sequences and that will result in 8 sequences of length 8. You would end up doing 64 computations (8x8), but you needed to do only 45 computations. Moreover, if you wanted to do something fancy like using a bidirectional-RNN, it would be harder to do batch computations just by padding and you might end up doing more computations than required.

相反,PyTorch 允许我们打包序列,内部打包的序列是两个列表的元组.一个包含序列的元素.元素按时间步长交错(参见下面的示例),其他元素包含 每个序列的大小 每步的批量大小.这有助于恢复实际序列以及告诉 RNN 每个时间步的批量大小是多少.@Aerin 已经指出了这一点.这可以传递给 RNN,它会在内部优化计算.

Instead, PyTorch allows us to pack the sequence, internally packed sequence is a tuple of two lists. One contains the elements of sequences. Elements are interleaved by time steps (see example below) and other contains the size of each sequence the batch size at each step. This is helpful in recovering the actual sequences as well as telling RNN what is the batch size at each time step. This has been pointed by @Aerin. This can be passed to RNN and it will internally optimize the computations.

我可能在某些方面不清楚,所以请告诉我,我可以添加更多解释.

I might have been unclear at some points, so let me know and I can add more explanations.

这是一个代码示例:

 a = [torch.tensor([1,2,3]), torch.tensor([3,4])]
 b = torch.nn.utils.rnn.pad_sequence(a, batch_first=True)
 >>>>
 tensor([[ 1,  2,  3],
    [ 3,  4,  0]])
 torch.nn.utils.rnn.pack_padded_sequence(b, batch_first=True, lengths=[3,2])
 >>>>PackedSequence(data=tensor([ 1,  3,  2,  4,  3]), batch_sizes=tensor([ 2,  2,  1]))

这篇关于我们为什么要“打包"?PyTorch 中的序列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆