如何在 PyTorch 中正确地为嵌入、LSTM 和线性层提供输入? [英] How to correctly give inputs to Embedding, LSTM and Linear layers in PyTorch?

查看:59
本文介绍了如何在 PyTorch 中正确地为嵌入、LSTM 和线性层提供输入?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要清楚地了解如何使用 torch.nn 模块的不同组件正确准备用于批量训练的输入.具体来说,我希望为 seq2seq 模型创建一个编码器-解码器网络.

I need some clarity on how to correctly prepare inputs for batch-training using different components of the torch.nn module. Specifically, I'm looking to create an encoder-decoder network for a seq2seq model.

假设我有一个包含这三层的模块,顺序是:

Suppose I have a module with these three layers, in order:

  1. nn.Embedding
  2. nn.LSTM
  3. nn.Linear

nn.Embedding

输入: batch_size * seq_length
输出: batch_size * seq_length * embedding_dimension

我在这里没有任何问题,我只是想明确说明输入和输出的预期形状.

I don't have any problems here, I just want to be explicit about the expected shape of the input and output.

输入: seq_length * batch_size * input_size(在本例中为embedding_dimension)
输出: seq_length * batch_size * hidden_​​size
last_hidden_​​state: batch_size * hidden_​​size
last_cell_state: batch_size * hidden_​​size

Input: seq_length * batch_size * input_size (embedding_dimension in this case)
Output: seq_length * batch_size * hidden_size
last_hidden_state: batch_size * hidden_size
last_cell_state: batch_size * hidden_size

要使用 Embedding 层的输出作为 LSTM 层的输入,我需要转置轴 1 和 2.

To use the output of the Embedding layer as input for the LSTM layer, I need to transpose axis 1 and 2.

我在网上找到的许多例子都做类似 x = embeds.view(len(sentence), self.batch_size, -1) 的事情,但这让我很困惑.此视图如何确保同一批次的元素保留在同一批次中?len(sentence)self.batch 大小相同时会发生什么?

Many examples I've found online do something like x = embeds.view(len(sentence), self.batch_size , -1), but that confuses me. How does this view ensure that elements of the same batch remain in the same batch? What happens when len(sentence) and self.batch size are of same size?

输入: batch_size x input_size(本例中为LSTM的hidden_​​size或??)
输出: batch_size x output_size

Input: batch_size x input_size (hidden_size of LSTM in this case or ??)
Output: batch_size x output_size

如果我只需要LSTMlast_hidden_​​state,那么我可以把它作为nn.Linear的输入.

If I only need the last_hidden_state of LSTM, then I can give it as input to nn.Linear.

但如果我想使用输出(也包含所有中间隐藏状态),那么我需要将 nn.Linear 的输入大小更改为 seq_length * hidden_​​size 并将输出用作 Linear 模块的输入,我需要转置输出的轴 1 和轴 2,然后我可以使用 Output_transposed(batch_size, -1) 查看.

But if I want to make use of Output (which contains all intermediate hidden states as well) then I need to change nn.Linear's input size to seq_length * hidden_size and to use Output as input to Linear module I need to transpose axis 1 and 2 of output and then I can view with Output_transposed(batch_size, -1).

我的理解正确吗?我如何在张量 (tensor.transpose(0, 1)) 中执行这些转置操作?

Is my understanding here correct? How do I carry out these transpose operations in tensors (tensor.transpose(0, 1))?

推荐答案

你对大部分概念的理解是准确的,但是,这里和那里有一些缺失的地方.

Your understanding of most of the concepts is accurate, but, there are some missing points here and there.

您有 (batch_size, seq_len, embedding_size) 形状的嵌入输出.现在,您可以通过多种方式将其传递给 LSTM.
* 如果 LSTM 接受输入为 batch_first,您可以将其直接传递给 LSTM.因此,在创建您的 LSTM 传递参数 batch_first=True 时.
* 或者,您可以以 (seq_len, batch_size, embedding_size) 的形式传递输入.因此,要将嵌入输出转换为这种形状,您需要使用 torch.transpose(tensor_name, 0, 1) 转置第一维和第二维,就像您提到的那样.

You have embedding output in the shape of (batch_size, seq_len, embedding_size). Now, there are various ways through which you can pass this to the LSTM.
* You can pass this directly to the LSTM, if LSTM accepts input as batch_first. So, while creating your LSTM pass argument batch_first=True.
* Or, you can pass input in the shape of (seq_len, batch_size, embedding_size). So, to convert your embedding output to this shape, you’ll need to transpose the first and second dimensions using torch.transpose(tensor_name, 0, 1), like you mentioned.

问.我在网上看到很多类似 x = embeds.view(len(sentence), self.batch_size, -1) 的例子,这让我很困惑.
答:这是错误的.它将混淆批次,您将尝试学习一项无望的学习任务.无论你在哪里看到这个,你都可以告诉作者改变这个语句并使用转置代替.

Q. I see many examples online which do something like x = embeds.view(len(sentence), self.batch_size , -1) which confuses me.
A. This is wrong. It will mix up batches and you will be trying to learn a hopeless learning task. Wherever you see this, you can tell the author to change this statement and use transpose instead.

有一个支持不使用 batch_first 的论点,它指出 Nvidia CUDA 提供的底层 API 使用批处理作为辅助运行速度要快得多.

There is an argument in favor of not using batch_first, which states that the underlying API provided by Nvidia CUDA runs considerably faster using batch as secondary.

您直接将嵌入输出提供给 LSTM,这会将 LSTM 的输入大小固定为上下文大小为 1.这意味着如果您的输入是 LSTM 的单词,您将始终一次给它一个单词.但是,这并不是我们一直想要的.因此,您需要扩展上下文大小.这可以按如下方式完成 -

You are directly feeding the embedding output to LSTM, this will fix the input size of LSTM to context size of 1. This means that if your input is words to LSTM, you will be giving it one word at a time always. But, this is not what we want all the time. So, you need to expand the context size. This can be done as follows -

# Assuming that embeds is the embedding output and context_size is a defined variable
embeds = embeds.unfold(1, context_size, 1)  # Keeping the step size to be 1
embeds = embeds.view(embeds.size(0), embeds.size(1), -1)

展开文档
现在,您可以按照上述步骤将其提供给 LSTM,只需记住 seq_len 现在已更改为 seq_len - context_size + 1embedding_size(LSTM 的输入大小)现在更改为 context_size * embedding_size

Unfold documentation
Now, you can proceed as mentioned above to feed this to the LSTM, just remembed that seq_len is now changed to seq_len - context_size + 1 and embedding_size (which is the input size of the LSTM) is now changed to context_size * embedding_size

批处理中不同实例的输入大小不会始终相同.例如,有些句子可能有 10 个字长,有些可能有 15 个字,有些可能有 1000 个字.因此,您肯定希望将可变长度序列输入到循环单元中.为此,在将输入提供给网络之前,需要执行一些额外的步骤.您可以按照以下步骤操作 -
1. 将您的批次从最大序列到最小序列进行排序.
2. 创建一个 seq_lengths 数组,用于定义批处理中每个序列的长度.(这可以是一个简单的python列表)
3. 将所有序列填充为与最大序列等长.
4. 创建该批次的 LongTensor 变量.
5. 现在,通过嵌入传递上述变量并创建适当的上下文大小输入后,您需要按如下方式打包序列 -

Input size of different instances in a batch will not be the same always. For example, some of your sentence might be 10 words long and some might be 15 and some might be 1000. So, you definitely want variable length sequence input to your recurrent unit. To do this, there are some additional steps that needs to be performed before you can feed your input to the network. You can follow these steps -
1. Sort your batch from largest sequence to the smallest.
2. Create a seq_lengths array that defines the length of each sequence in the batch. (This can be a simple python list)
3. Pad all the sequences to be of equal length to the largest sequence.
4. Create LongTensor Variable of this batch.
5. Now, after passing the above variable through embedding and creating the proper context size input, you’ll need to pack your sequence as follows -

# Assuming embeds to be the proper input to the LSTM
lstm_input = nn.utils.rnn.pack_padded_sequence(embeds, [x - context_size + 1 for x in seq_lengths], batch_first=False)

理解LSTM的输出

现在,一旦您准备好 lstm_input acc.根据您的需要,您可以将 lstm 称为

Understanding output of LSTM

Now, once you have prepared your lstm_input acc. To your needs, you can call lstm as

lstm_outs, (h_t, h_c) = lstm(lstm_input, (h_t, h_c))

这里需要提供(h_t, h_c)作为初始隐藏状态,它会输出最终的隐藏状态.你可以看到,为什么需要打包可变长度序列,否则 LSTM 也会运行在非必需的填充词上.
现在,lstm_outs 将是一个打包序列,它是 lstm 在每一步的输出,(h_t, h_c) 分别是最终输出和最终单元状态.h_th_c 的形状为 (batch_size, lstm_size).您可以直接将这些用于进一步的输入,但如果您也想使用中间输出,则需要先解压 lstm_outs 如下

Here, (h_t, h_c) needs to be provided as the initial hidden state and it will output the final hidden state. You can see, why packing variable length sequence is required, otherwise LSTM will run the over the non-required padded words as well.
Now, lstm_outs will be a packed sequence which is the output of lstm at every step and (h_t, h_c) are the final outputs and the final cell state respectively. h_t and h_c will be of shape (batch_size, lstm_size). You can use these directly for further input, but if you want to use the intermediate outputs as well you’ll need to unpack the lstm_outs first as below

lstm_outs, _ = nn.utils.rnn.pad_packed_sequence(lstm_outs)

现在,您的 lstm_outs 的形状将是 (max_seq_len - context_size + 1, batch_size, lstm_size).现在,您可以根据需要提取 lstm 的中间输出.

Now, your lstm_outs will be of shape (max_seq_len - context_size + 1, batch_size, lstm_size). Now, you can extract the intermediate outputs of lstm according to your need.

请记住,解包后的输出将在每个批次的大小之后有 0,这只是填充以匹配最大序列的长度(始终是第一个,因为我们将输入从最大到最小排序).

Remember that the unpacked output will have 0s after the size of each batch, which is just padding to match the length of the largest sequence (which is always the first one, as we sorted the input from largest to the smallest).

另请注意,h_t 将始终等于每个批处理输出的最后一个元素.

Also note that, h_t will always be equal to the last element for each batch output.

将 lstm 连接到线性

现在,如果您只想使用 lstm 的输出,您可以直接将 h_t 提供给您的线性层,它会起作用.但是,如果你也想使用中间输出,那么你需要弄清楚,你将如何将它输入到线性层(通过一些注意力网络或一些池化).您不想将完整的序列输入到线性层,因为不同的序列将具有不同的长度,并且您无法固定线性层的输入大小.是的,您需要转置 lstm 的输出以进一步使用(同样,您不能在这里使用视图).

Interfacing lstm to linear

Now, if you want to use just the output of the lstm, you can directly feed h_t to your linear layer and it will work. But, if you want to use intermediate outputs as well, then, you’ll need to figure out, how are you going to input this to the linear layer (through some attention network or some pooling). You do not want to input the complete sequence to the linear layer, as different sequences will be of different lengths and you can’t fix the input size of the linear layer. And yes, you’ll need to transpose the output of lstm to be further used (Again you cannot use view here).

结束注意:我特意留下了一些要点,例如使用双向循环单元、在展开时使用步长和接口注意力,因为它们会变得非常麻烦并且超出了本答案的范围.

Ending Note: I have purposefully left some points, such as using bidirectional recurrent cells, using step size in unfold, and interfacing attention, as they can get quite cumbersome and will be out of the scope of this answer.

这篇关于如何在 PyTorch 中正确地为嵌入、LSTM 和线性层提供输入?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆