PyTorch LSTM输入尺寸 [英] PyTorch LSTM input dimension

查看:61
本文介绍了PyTorch LSTM输入尺寸的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用PyTorch LSTM训练一个简单的2层神经网络,而在解释Py​​Torch文档时遇到了麻烦.具体来说,我不太确定如何处理训练数据的形状.

I'm trying train a simple 2 layer neural network with PyTorch LSTMs and I'm having trouble interpreting the PyTorch documentation. Specifically, I'm not too sure how to go about with the shape of my training data.

我想做的是通过迷你批次在非常大的数据集上训练我的网络,其中每批次说100个元素.每个数据元素将具有5个功能.文档指出该层的输入应为形状(seq_len,batch_size,input_size).我应该如何调整输入的形状?

What I want to do is train my network on a very large dataset through mini-batches, where each batch is say, 100 elements long. Each data element will have 5 features. The documentation states that the input to the layer should be of shape (seq_len, batch_size, input_size). How should I go about shaping the input?

我一直在关注这个帖子: https://discuss.pytorch.org/t/understanding-lstm-input/31110/3 如果我正确地解释了这一点,则每个小批量的形状都应为(100,100,5).但是在这种情况下,seq_len和batch_size有什么区别?另外,这是否意味着输入LSTM层的第一层应具有5个单位?

I've been following this post: https://discuss.pytorch.org/t/understanding-lstm-input/31110/3 and if I'm interpreting this correctly, each minibatch should be of shape (100, 100, 5). But in this case, what's the difference between seq_len and batch_size? Also, would this mean that the first layer that the input LSTM layer should have 5 units?

谢谢!

推荐答案

这是一个古老的问题,但是由于已被查看了80多次而没有任何响应,因此让我对其进行解释.

This is an old question, but since it has been viewed 80+ times with no response, let me take a crack at it.

LSTM网络用于预测序列.在NLP中,这将是一个单词序列;在经济学中,一系列经济指标;等

An LSTM network is used to predict a sequence. In NLP, that would be a sequence of words; in economics, a sequence of economic indicators; etc.

第一个参数是这些序列的长度.如果序列数据是由句子组成的,那么汤姆的猫又黑又丑"是一个长度为7(seq_len)的序列,每个单词一个,或者可能是第8个,表示句子的结尾.

The first parameter is the length of those sequences. If you sequence data is made of sentences, then "Tom has a black and ugly cat" is a sequence of length 7 (seq_len), one for each word, and maybe an 8th to indicate the end of the sentence.

当然,您可能会反对如果我的序列长度不同会怎样?"这是常见的情况.

Of course, you might object "what if my sequences are of varying length?" which is a common situation.

两个最常见的解决方案是:

The two most common solutions are:

  1. 使用空元素填充序列.例如,如果最长的句子有15个单词,则将上面的句子编码为"[Tom] [has] [a] [black] [and] [ugly] [cat] [EOS] [] [] [][] [] [] []",其中EOS代表句子结尾.突然,您所有的序列长度都变为15,这解决了您的问题.一旦找到[EOS]令牌,该模型就会迅速得知,它后面是无限制的空令牌序列[],这种方法几乎不会给您的网络增加负担.

  1. Pad your sequences with empty elements. For instance, if the longest sentence you have has 15 words, then encode the sentence above as "[Tom] [has] [a] [black] [and] [ugly] [cat] [EOS] [] [] [] [] [] [] []", where EOS stands for end of sentence. Suddenly, all your sequences become of length 15, which solves your issue. As soon as the [EOS] token is found, the model will learn quickly that it is followed by an unlimited sequence of empty tokens [], and that approach will barely tax your network.

发送相同长度的迷你批.例如,在所有句子上使用2个单词训练网络,然后使用3个单词,然后使用4个单词.当然,每个小批量的seq_len都会增加,每个小批量的大小将根据长度为N的序列数而变化您的数据中就有.

Send mini-batches of equal lengths. For instance, train the network on all sentences with 2 words, then with 3, then with 4. Of course, seq_len will be increased at each mini batch, and the size of each mini batch will vary based on how many sequences of length N you have in your data.

最好的方法是将数据分成大小大致相等的小批量,按近似长度将其分组,并仅添加必要的填充.例如,如果您将长度为6、7和8的句子最小化在一起,那么长度为8的序列将不需要填充,而长度为6的序列将只需要2.,那是最好的方法.

A best-of-both-world approach would be to divide your data into mini batches of roughly equal sizes, grouping them by approximate length, and adding only the necessary padding. For instance, if you mini-batch together sentences of length 6, 7 and 8, then sequences of length 8 will require no padding, whereas sequence of length 6 will require only 2. If you have a large dataset with sequences of widely varying length, that's the best approach.

但是,方法1是最简单(也是最懒惰的)方法,并且在小型数据集上效果很好.

Option 1 is the easiest (and laziest) approach, though, and will work great on small datasets.

最后一件事...始终在数据末尾而不是在开头填充数据.

One last thing... Always pad your data at the end, not at the beginning.

我希望能帮上忙.

这篇关于PyTorch LSTM输入尺寸的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆