如何为LSTM处理可变长度数据 [英] How to handle variable length data for LSTM

查看:54
本文介绍了如何为LSTM处理可变长度数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

据我了解,预处理LSTM数据的一般步骤包括以下步骤

From what I know the general steps to preprocess data for LSTM include the following steps

vocab_size = 20000  # Only consider the top 20k words
maxlen = 200  # Only consider the first 200 words of each movie review
(x_train, y_train), (x_val, y_val) = keras.datasets.imdb.load_data(num_words=vocab_size)
print(len(x_train), "Training sequences")
print(len(x_val), "Validation sequences")
x_train0 = keras.preprocessing.sequence.pad_sequences(x_train, maxlen=maxlen)
x_val0 = keras.preprocessing.sequence.pad_sequences(x_val, maxlen=maxlen)

这里,火车将由25,000个可变长度的样本组成,并且在应用序列填充之后,如果序列的长度大于200,则它将截断数据,使其长度为200.如果数据长度小于200,则会在数据上填充0.

Here, train will consist of 25,000 samples of variable length, and after applying the sequence padding it will truncate data to be of length 200 if the sequence is more than 200 in length & will pad 0's to the data for cases where data is shorter than 200 in length.

但是,如果您的数据序列长度为200 +/- 50或范围在90到500之间,那么这并不是一个大问题.

But this isn't a big problem if your data is sequence of length 200 +/- 50 or has a range from 90 to 500.

您如何解决数据长度范围为100->的问题?长度是60,000?

推荐答案

**在LSTM体系结构中有一种处理方法:**

**There is a way to handle that in LSTM architecture: **

  1. 在您的lstm中,将input_shape参数的时间步组件设置为没有,这将帮助您接受可变长度的序列.

  1. In you lstm set the timestep component of input_shape argument as None, this will help you accept sequence of variable length.

现在将出现一个问题,因为您必须适应输入进入numpy数组,该数组具有严格的结构(长度相同).所以我要做的是将您输入的分组分为相同长度的批次,然后做一个数组.现在,将其馈送到您的网络中.

Now one problem will raise because you will have to fit the inputs into numpy array, which has a strict structure (same length). So what I do is Group you inputs into batches of same length and make an array of it. Now feed it to your network.

Ex-

lstm = tf.keras.layers.LSTM(latent_dim, input_shape=(None, vocab_len, ))

for ip in inputs.groupby(lenghtofinputs):
    model.fit(ip, outputof(ip), epoch=100)

请让我知道它是否适合您的案件,对我有用.

Please let me know if it's working with your case, works for me.

这篇关于如何为LSTM处理可变长度数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆