在Keras中的RNN上设置输入 [英] Setting up the input on an RNN in Keras

查看:342
本文介绍了在Keras中的RNN上设置输入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我对在Keras中设置输入有一个特定的问题.

So I had a specific question with setting up the input in Keras.

我知道序列长度是指您要建模的最长序列的窗口长度,其余序列用0填充.

I understand that the sequence length refers to the window length of the longest sequence that you are looking to model with the rest being padded by 0's.

但是,如何设置时间序列数组中已经存在的内容?

However, how do I set up something that is already in a time series array?

例如,现在我有一个550k x 28的数组.因此,有550k行,每个行有28列(27个要素和1个目标).我是否必须手动将阵列拆分为(550k序列长度)不同的阵列,然后将所有阵列馈入网络?

For example, right now I have an array that is 550k x 28. So there are 550k rows each with 28 columns (27 features and 1 target). Do I have to manually split the array into (550k- sequence length) different arrays and feed all of those to the network?

假设我希望第一层等于每行要素的数量,然后看一下过去的50行,如何调整输入层的大小?

Assuming that I want to the first layer to be equivalent to the number of features per row, and looking at the past 50 rows, how do I size the input layer?

这仅仅是input_size =(50,27),还是我必须手动拆分数据集吗?还是Keras会自动为我做到这一点?

Is that simply input_size = (50,27), and again do I have to manually split the dataset up or would Keras automatically do that for me?

推荐答案

RNN输入类似于:(NumberOfSequences, TimeSteps, ElementsPerStep)

RNN inputs are like: (NumberOfSequences, TimeSteps, ElementsPerStep)

  • 每个序列在您的输入数组中都是一行.这也称为批量大小",示例数量,样本等.

  • Each sequence is a row in your input array. This is also called "batch size", number of examples, samples, etc.

时间步长是每个序列的步数

Time steps are the amount of steps for each sequence

每步元素是您在序列的每个步骤中拥有的信息

Elements per step is how much info you have in each step of a sequence

我假设27个要素是输入并与ElementsPerStep相关,而1个目标是预期输出,每步具有1个输出. 因此,我还假设您的输出是一个有55万步的序列.

I'm assuming the 27 features are inputs and relate to ElementsPerStep, while the 1 target is the expected output having 1 output per step. So I'm also assuming that your output is a sequence with also 550k steps.

调整数组的形状:

由于数组中只有一个序列,并且此序列有550k步骤,因此您必须像这样重塑数组:

Since you have only one sequence in the array, and this sequence has 550k steps, then you must reshape your array like this:

(1, 550000, 28) 
    #1 sequence
    #550000 steps per sequence    
    #28 data elements per step

#PS: this sequence is too long, if it creates memory problems to you, maybe it will be a good idea to use a `stateful=True` RNN, but I'm explaining the non stateful method first. 

现在,您必须将此数组拆分为输入和目标:

Now you must split this array for inputs and targets:

X_train = thisArray[:, :, :27] #inputs
Y_train = thisArray[:, :,  27] #targets

塑造keras层:

Keras图层在定义它们时将忽略批处理大小(序列数),因此将使用input_shape=(550000,27).

Keras layers will ignore the batch size (number of sequences) when you define them, so you will use input_shape=(550000,27).

由于您想要的结果是长度相同的序列,因此我们将使用return_sequences=True. (否则,您只会得到一个结果).

Since your desired result is a sequence with same length, we will use return_sequences=True. (Else, you'd get only one result).

 LSTM(numberOfCells, input_shape=(550000,27), return_sequences=True)

这将输出(BatchSize, 550000, numberOfCells)

您可以使用具有1个单元的单层来实现输出,也可以堆叠更多的层,因为考虑到最后一层应该具有1个单元来匹配输出的形状. (当然,如果您仅使用循环图层)

You may use a single layer with 1 cell to achieve your output, or you could stack more layers, considering that the last one should have 1 cell to match the shape of your output. (If you're using only recurrent layers, of course)

状态=真:

当序列太长而导致内存无法很好地处理它们时,必须使用stateful=True定义图层.

When you have sequences so long that your memory can't handle them well, you must define the layer with stateful=True.

在这种情况下,您将必须将X_train分成较小长度的序列*.系统将理解,每个新批次都是先前批次的续集.

In that case, you will have to divide X_train in smaller length sequences*. The system will understand that every new batch is a sequel of the previous batches.

然后,您将需要定义batch_input_shape=(BatchSize,ReducedTimeSteps,Elements).在这种情况下,批处理大小不应像其他情况一样被忽略.

Then you will need to define batch_input_shape=(BatchSize,ReducedTimeSteps,Elements). In this case, the batch size should not be ignored like in the other case.

*不幸的是,我没有使用stateful=True的经验.我不确定是否必须手动划分数组(我猜不太可能),或者系统是否在内部自动划分数组(可能性更大).

* Unfortunately I have no experience with stateful=True. I'm not sure about whether you must manually divide your array (less likely, I guess), or if the system automatically divides it internally (more likely).

在这种情况下,我经常看到人们像这样划分输入数据:

In this case, what I often see is people dividing the input data like this:

从550k步骤开始,获得具有50个步骤的较小阵列:

From the 550k steps, get smaller arrays with 50 steps:

X = []

for i in range(550000-49):
    X.append(originalX[i:i+50]) #then take care of the 28th element

Y = #it seems you just exclude the first 49 ones from the original

这篇关于在Keras中的RNN上设置输入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆