如何正确地为PyTorch中的Embedding,LSTM和Linear层提供输入? [英] How to correctly give inputs to Embedding, LSTM and Linear layers in PyTorch?

查看:397
本文介绍了如何正确地为PyTorch中的Embedding,LSTM和Linear层提供输入?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要一些清楚的信息,说明如何使用torch.nn模块的不同组件正确地准备用于批培训的输入.具体来说,我正在寻找一个用于seq2seq模型的编码器-解码器网络.

I need some clarity on how to correctly prepare inputs for batch-training using different components of the torch.nn module. Specifically, I'm looking to create an encoder-decoder network for a seq2seq model.

假设我有一个包含这三层的模块,依次为:

Suppose I have a module with these three layers, in order:

  1. nn.Embedding
  2. nn.LSTM
  3. nn.Linear
  1. nn.Embedding
  2. nn.LSTM
  3. nn.Linear

nn.Embedding

输入:batch_size * seq_length
输出: batch_size * seq_length * embedding_dimension

nn.Embedding

Input: batch_size * seq_length
Output: batch_size * seq_length * embedding_dimension

我在这里没有任何问题,我只想明确说明输入和输出的预期形状.

I don't have any problems here, I just want to be explicit about the expected shape of the input and output.

输入:seq_length * batch_size * input_size(在这种情况下为embedding_dimension)
输出:seq_length * batch_size * hidden_size
last_hidden_​​state: batch_size * hidden_size
last_cell_state: batch_size * hidden_size

Input: seq_length * batch_size * input_size (embedding_dimension in this case)
Output: seq_length * batch_size * hidden_size
last_hidden_state: batch_size * hidden_size
last_cell_state: batch_size * hidden_size

要将Embedding层的输出用作LSTM层的输入,我需要转置轴1和2.

To use the output of the Embedding layer as input for the LSTM layer, I need to transpose axis 1 and 2.

我在网上发现的许多示例都执行类似x = embeds.view(len(sentence), self.batch_size , -1)的操作,但这使我感到困惑.该视图如何确保同一批次中的元素保留在同一批次中? len(sentence)self.batch大小相同时会发生什么?

Many examples I've found online do something like x = embeds.view(len(sentence), self.batch_size , -1), but that confuses me. How does this view ensure that elements of the same batch remain in the same batch? What happens when len(sentence) and self.batch size are of same size?

输入:batch_size x input_size(在这种情况下为LSTM的隐藏大小或??)
输出:batch_size x output_size

Input: batch_size x input_size (hidden_size of LSTM in this case or ??)
Output: batch_size x output_size

如果我只需要LSTMlast_hidden_state,则可以将其作为nn.Linear的输入.

If I only need the last_hidden_state of LSTM, then I can give it as input to nn.Linear.

但是,如果我想使用Output(它也包含所有中间隐藏状态),则需要将nn.Linear的输入大小更改为seq_length * hidden_size,并使用Output作为Linear模块I的输入需要转置输出的轴1和2,然后可以使用Output_transposed(batch_size, -1)进行查看.

But if I want to make use of Output (which contains all intermediate hidden states as well) then I need to change nn.Linear's input size to seq_length * hidden_size and to use Output as input to Linear module I need to transpose axis 1 and 2 of output and then I can view with Output_transposed(batch_size, -1).

我在这里的理解正确吗?如何在张量(tensor.transpose(0, 1))中执行这些转置操作?

Is my understanding here correct? How do I carry out these transpose operations in tensors (tensor.transpose(0, 1))?

推荐答案

您对大多数概念的理解是准确的,但是到处都有一些遗漏之处.

Your understanding of most of the concepts is accurate, but, there are some missing points here and there.

您已嵌入(batch_size, seq_len, embedding_size)形状的输出.现在,有多种方法可以将其传递给LSTM.
*如果LSTM接受输入为batch_first,则可以将其直接传递给LSTM.因此,在创建LSTM传递参数batch_first=True时.
*或者,您可以以(seq_len, batch_size, embedding_size)的形式传递输入.因此,要将嵌入的输出转换为这种形状,您需要像前面提到的那样使用torch.transpose(tensor_name, 0, 1)转置第一维和第二维.

You have embedding output in the shape of (batch_size, seq_len, embedding_size). Now, there are various ways through which you can pass this to the LSTM.
* You can pass this directly to the LSTM, if LSTM accepts input as batch_first. So, while creating your LSTM pass argument batch_first=True.
* Or, you can pass input in the shape of (seq_len, batch_size, embedding_size). So, to convert your embedding output to this shape, you’ll need to transpose the first and second dimensions using torch.transpose(tensor_name, 0, 1), like you mentioned.

问:我在网上看到许多例子,它们使x = embeds.view(len(sentence),self.batch_size,-1)感到困惑.
答:这是错误的.它将混合批次,您将尝试学习无望的学习任务.无论您在何处看到此内容,都可以告诉作者更改此语句并改用转置.

Q. I see many examples online which do something like x = embeds.view(len(sentence), self.batch_size , -1) which confuses me.
A. This is wrong. It will mix up batches and you will be trying to learn a hopeless learning task. Wherever you see this, you can tell the author to change this statement and use transpose instead.

有一个论点赞成不使用batch_first,它指出Nvidia CUDA提供的底层API使用批处理作为辅助API的运行速度要快得多.

There is an argument in favor of not using batch_first, which states that the underlying API provided by Nvidia CUDA runs considerably faster using batch as secondary.

您直接将嵌入输出提供给LSTM,这会将LSTM的输入大小固定为1的上下文大小.这意味着,如果您输入的内容是LSTM的单词,则总是一次给它一个单词.但是,这不是我们一直想要的.因此,您需要扩展上下文大小.可以按照以下步骤进行操作-

You are directly feeding the embedding output to LSTM, this will fix the input size of LSTM to context size of 1. This means that if your input is words to LSTM, you will be giving it one word at a time always. But, this is not what we want all the time. So, you need to expand the context size. This can be done as follows -

# Assuming that embeds is the embedding output and context_size is a defined variable
embeds = embeds.unfold(1, context_size, 1)  # Keeping the step size to be 1
embeds = embeds.view(embeds.size(0), embeds.size(1), -1)

展开文档
现在,您可以如上所述进行操作以将其输入到LSTM,只是要记住,seq_len现在已更改为seq_len - context_size + 1,而embedding_size(这是LSTM的输入大小)现在已更改为

Unfold documentation
Now, you can proceed as mentioned above to feed this to the LSTM, just remembed that seq_len is now changed to seq_len - context_size + 1 and embedding_size (which is the input size of the LSTM) is now changed to context_size * embedding_size

批处理中不同实例的输入大小将始终不相同.例如,您的某些句子可能长达10个单词,而某些句子可能长达15个单词,而某些句子可能长达1000个单词.因此,您肯定希望将可变长度序列输入到循环单元中.为此,您需要执行一些附加步骤,然后才能将输入提供给网络.您可以按照以下步骤-
1.从最大顺序到最小顺序对批次进行排序.
2.创建一个seq_lengths数组,该数组定义批处理中每个序列的长度. (这可以是一个简单的python列表)
3.填充所有序列,使其长度与最大序列相等.
4.创建该批次的LongTensor变量.
5.现在,在通过嵌入传递上述变量并创建适当的上下文尺寸输入之后,您需要按以下步骤打包序列-

Input size of different instances in a batch will not be the same always. For example, some of your sentence might be 10 words long and some might be 15 and some might be 1000. So, you definitely want variable length sequence input to your recurrent unit. To do this, there are some additional steps that needs to be performed before you can feed your input to the network. You can follow these steps -
1. Sort your batch from largest sequence to the smallest.
2. Create a seq_lengths array that defines the length of each sequence in the batch. (This can be a simple python list)
3. Pad all the sequences to be of equal length to the largest sequence.
4. Create LongTensor Variable of this batch.
5. Now, after passing the above variable through embedding and creating the proper context size input, you’ll need to pack your sequence as follows -

# Assuming embeds to be the proper input to the LSTM
lstm_input = nn.utils.rnn.pack_padded_sequence(embeds, [x - context_size + 1 for x in seq_lengths], batch_first=False)

了解LSTM的输出

现在,一旦您准备好lstm_input acc.根据您的需要,您可以将ltm称为

Understanding output of LSTM

Now, once you have prepared your lstm_input acc. To your needs, you can call lstm as

lstm_outs, (h_t, h_c) = lstm(lstm_input, (h_t, h_c))

在这里,需要提供(h_t, h_c)作为初始隐藏状态,它将输出最终隐藏状态.您会看到为什么需要打包可变长度的序列,否则LSTM也会在不需要的填充单词上运行.
现在,lstm_outs将是一个打包序列,这是每个步骤的lstm输出,而(h_t, h_c)分别是最终输出和最终单元状态. h_th_c的形状将为(batch_size, lstm_size).您可以直接将它们用于进一步的输入,但是如果您还想使用中间输出,则需要先打开lstm_outs的包装,如下所示

Here, (h_t, h_c) needs to be provided as the initial hidden state and it will output the final hidden state. You can see, why packing variable length sequence is required, otherwise LSTM will run the over the non-required padded words as well.
Now, lstm_outs will be a packed sequence which is the output of lstm at every step and (h_t, h_c) are the final outputs and the final cell state respectively. h_t and h_c will be of shape (batch_size, lstm_size). You can use these directly for further input, but if you want to use the intermediate outputs as well you’ll need to unpack the lstm_outs first as below

lstm_outs, _ = nn.utils.rnn.pad_packed_sequence(lstm_outs)

现在,您的lstm_outs的形状将为(max_seq_len - context_size + 1, batch_size, lstm_size).现在,您可以根据需要提取lstm的中间输出.

Now, your lstm_outs will be of shape (max_seq_len - context_size + 1, batch_size, lstm_size). Now, you can extract the intermediate outputs of lstm according to your need.

请记住,解压缩后的输出在每个批处理的大小之后将有0,这只是填充以匹配最大序列的长度(始终将第一个序列匹配,因为我们将输入从最大到最小排序).

Remember that the unpacked output will have 0s after the size of each batch, which is just padding to match the length of the largest sequence (which is always the first one, as we sorted the input from largest to the smallest).

还请注意,每个批处理输出的h_t始终等于最后一个元素.

Also note that, h_t will always be equal to the last element for each batch output.

将lstm与线性接口

现在,如果只想使用lstm的输出,则可以直接将h_t馈送到线性层,它将起作用.但是,如果您还想使用中间输出,则需要弄清楚如何将其输入到线性层(通过一些注意力网络或某些池).您不希望将完整的序列输入到线性图层,因为不同的序列将具有不同的长度,并且您无法确定线性图层的输入大小.是的,您需要转置lstm的输出以进一步使用(同样,您不能在此处使用view).

Interfacing lstm to linear

Now, if you want to use just the output of the lstm, you can directly feed h_t to your linear layer and it will work. But, if you want to use intermediate outputs as well, then, you’ll need to figure out, how are you going to input this to the linear layer (through some attention network or some pooling). You do not want to input the complete sequence to the linear layer, as different sequences will be of different lengths and you can’t fix the input size of the linear layer. And yes, you’ll need to transpose the output of lstm to be further used (Again you cannot use view here).

结束语:我特意留下了一些要点,例如使用双向递归细胞,展开步长和吸引注意力,因为它们可能很麻烦,并且不在本答案的范围之内.

Ending Note: I have purposefully left some points, such as using bidirectional recurrent cells, using step size in unfold, and interfacing attention, as they can get quite cumbersome and will be out of the scope of this answer.

这篇关于如何正确地为PyTorch中的Embedding,LSTM和Linear层提供输入?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆