用于多元时间序列的Keras递归神经网络 [英] Keras Recurrent Neural Networks For Multivariate Time Series

查看:78
本文介绍了用于多元时间序列的Keras递归神经网络的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在阅读有关Keras RNN模型(LSTM和GRU)的信息,作者似乎主要关注语言数据或使用由先前时间步组成的训练实例的单变量时间序列.我的数据有些不同.

I have been reading about Keras RNN models (LSTMs and GRUs), and authors seem to largely focus on language data or univariate time series that use training instances composed of previous time steps. The data I have is a bit different.

我有10个十年中每年测量的20个变量作为输入数据,而对于11年级有20个变量作为输出数据.我想做的是预测第11年变量之一(而不是其他19个变量)的值.

I have 20 variables measured every year for 10 years for 100,000 persons as input data, and the 20 variables measured for year 11 as output data. What I would like to do is predict the value of one of the variables (not the other 19) for the 11th year.

我的数据结构为X.shape = [persons, years, variables] = [100000, 10, 20]Y.shape = [persons, variable] = [100000, 1].下面是我的LSTM模型的Python代码.

I have my data structured as X.shape = [persons, years, variables] = [100000, 10, 20] and Y.shape = [persons, variable] = [100000, 1]. Below is my Python code for a LSTM model.

## LSTM model.

# Define model.

network_lstm = models.Sequential()
network_lstm.add(layers.LSTM(128, activation = 'tanh', 
     input_shape = (X.shape[1], X.shape[2])))
network_lstm.add(layers.Dense(1, activation = None))

# Compile model.

network_lstm.compile(optimizer = 'adam', loss = 'mean_squared_error')

# Fit model.

history_lstm = network_lstm.fit(X, Y, epochs = 25, batch_size = 128)

我有四个(相关)问题,请:

I have four (related) questions, please:

  1. 我是否为我拥有的数据结构正确编码了Keras模型?我从完全连接的网络(使用扁平化数据)以及LSTM,GRU和1D CNN模型获得的性能几乎相同,而且我不知道我是否在Keras中犯了错误,或者是否经常使用递归模型在这种情况下没有帮助.

  1. Have I coded the Keras model correctly for the data structure I have? The performance I get from a fully-connected network (using flattened data) and from LSTM, GRU, and 1D CNN models are nearly identical, and I don't know if I have made an error in Keras or if a recurrent model is simply not helpful in this case.

我是否应将Y作为具有形状Y.shape = [persons, years] = [100000, 11]的序列,而不是在X中包含变量,使其具有形状X.shape = [persons, years, variables] = [100000, 10, 19]?如果是这样,如何获取RNN以输出预测的序列?当我使用return_sequences = True时,Keras返回错误.

Should I have Y as a series with shape Y.shape = [persons, years] = [100000, 11], rather than including the variable in X, which would then have shape X.shape = [persons, years, variables] = [100000, 10, 19]? If so, how can I get the RNN to output the predicted sequence? When I use return_sequences = True, Keras returns an error.

这是预测我拥有的数据的最佳方法吗? Keras RNN模型甚至其他模型中是否有更好的选项选择?

Is this the best way to predict with the data I have? Are there better option choices available in the Keras RNN models, or even other models?

如何模拟类似于我现有数据结构的数据,以使RNN模型优于完全连接的网络?

How could I simulate data resembling the data structure I have so that a RNN model would outperform a fully-connected network?

更新:

我已经尝试了一个模拟,我希望这是一个非常简单的情况,其中应该期望RNN胜过FNN.

I have tried a simulation, with what I hope is a very simple case where an RNN should be expected to outperform a FNN.

尽管当LSTM都有较少的隐藏层(4)时,LSTM往往胜过FNN,但与更多的隐藏层(8+)相比,性能变得相同.谁能想到一个更好的模拟,在这种模拟中,预期RNN会比具有类似数据结构的FNN更好?

While the LSTM tends to outperform the FNN when both have less hidden layers (4), the performance becomes identical with more hidden layers (8+). Can anyone think of a better simulation where a RNN would be expected to outperform a FNN with a similar data structure?

from keras import models
from keras import layers

from keras.layers import Dense, LSTM

import numpy as np
import matplotlib.pyplot as plt

下面的代码模拟10,000个实例,10个时间步和2个变量的数据.如果第二个变量在第一个时间步中具有0,则Y是最后一个时间步的第一个变量的值乘以3.如果第二个变量在第一个时间步中具有1,则Y为最后一个时间步的第一个变量的值乘以9.

The code below simulates data for 10,000 instances, 10 time steps, and 2 variables. If the second variable has a 0 in the very first time step, then Y is the value of the first variable for the very last time step multiplied by 3. If the second variable has a 1 in the very first time step, then Y is the value of the first variable for the very last time step multiplied by 9.

我希望RNN会在内存中的第一个时间步保留第二个变量的值,并使用该值来知道哪个值(3或9)在最后一个时间步乘以第一个变量. /p>

My hope was that the RNN would keep the value of second variable at the very first time step in memory and use that to know which value (3 or 9) to multiply the the first variable for the very last time step.

## Simulate data.

instances = 10000

sequences = 10

X = np.zeros((instances, sequences * 2))

X[:int(instances / 2), 1] = 1

for i in range(instances):

    for j in range(0, sequences * 2, 2):

        X[i, j] = np.random.random()

Y = np.zeros((instances, 1))

for i in range(len(Y)):

    if X[i, 1] == 0:

        Y[i] = X[i, -2] * 3

    if X[i, 1] == 1:

        Y[i] = X[i, -2] * 9

以下是FNN的代码:

## Densely connected model.

# Define model.

network_dense = models.Sequential()
network_dense.add(layers.Dense(4, activation = 'relu', 
     input_shape = (X.shape[1],)))
network_dense.add(Dense(1, activation = None))

# Compile model.

network_dense.compile(optimizer = 'rmsprop', loss = 'mean_absolute_error')

# Fit model.

history_dense = network_dense.fit(X, Y, epochs = 100, batch_size = 256, verbose = False)

plt.scatter(Y[X[:, 1] == 0, :], network_dense.predict(X[X[:, 1] == 0, :]), alpha = 0.1)
plt.plot([0, 3], [0, 3], color = 'black', linewidth = 2)
plt.title('FNN, Second Variable has a 0 in the Very First Time Step')
plt.xlabel('Actual')
plt.ylabel('Predicted')

plt.show()

plt.scatter(Y[X[:, 1] == 1, :], network_dense.predict(X[X[:, 1] == 1, :]), alpha = 0.1)
plt.plot([0, 9], [0, 9], color = 'black', linewidth = 2)
plt.title('FNN, Second Variable has a 1 in the Very First Time Step')
plt.xlabel('Actual')
plt.ylabel('Predicted')

plt.show()

以下是LSTM的代码:

Below is code for a LSTM:

## Structure X data for LSTM.

X_lstm = X.reshape(X.shape[0], X.shape[1] // 2, 2)

X_lstm.shape

## LSTM model.

# Define model.

network_lstm = models.Sequential()
network_lstm.add(layers.LSTM(4, activation = 'relu', 
     input_shape = (X_lstm.shape[1], 2)))
network_lstm.add(layers.Dense(1, activation = None))

# Compile model.

network_lstm.compile(optimizer = 'rmsprop', loss = 'mean_squared_error')

# Fit model.

history_lstm = network_lstm.fit(X_lstm, Y, epochs = 100, batch_size = 256, verbose = False)

plt.scatter(Y[X[:, 1] == 0, :], network_lstm.predict(X_lstm[X[:, 1] == 0, :]), alpha = 0.1)
plt.plot([0, 3], [0, 3], color = 'black', linewidth = 2)
plt.title('LSTM, FNN, Second Variable has a 0 in the Very First Time Step')
plt.xlabel('Actual')
plt.ylabel('Predicted')

plt.show()

plt.scatter(Y[X[:, 1] == 1, :], network_lstm.predict(X_lstm[X[:, 1] == 1, :]), alpha = 0.1)
plt.plot([0, 9], [0, 9], color = 'black', linewidth = 2)
plt.title('LSTM, FNN, Second Variable has a 1 in the Very First Time Step')
plt.xlabel('Actual')
plt.ylabel('Predicted')

plt.show()

推荐答案

  1. 是的,所使用的代码对于您要执行的操作是正确的. 10年是用来预测下一年的时间窗口,因此应该是模型中20个变量中每个变量的输入数量. 100,000个观测值的样本量与模型的输入形状无关.

  1. Yes the code used is correct for what you are trying to do. 10 years is the time window used to predict the following year so that should be the number of inputs into your model for each of the 20 variables. The sample size of 100,000 observations is not relevant to the input shape of your model.

您最初对因变量Y进行整形的方式是正确的.您正在预测1年的1个变量的窗口,并且您有100,000个观测值.关键字参数return_sequences=True将导致引发错误,因为您只有一个LSTM层.如果要实现多个LSTM层,并且将有问题的层后面紧跟另一个LSTM层,请将此参数设置为True.

The way that you had originally shaped the dependent variable Y is correct. You are predicting a window of 1 year for 1 variable and you have 100,000 observations. The key word argument return_sequences=True will cause an error to be thrown because you only have a single LSTM layer. Set this parameter to True if you are implementing multiple LSTM layers and the layer in question is followed by another LSTM layer.

我希望我能为3提供一些指导,但实际上没有您的数据集,我不知道是否可以肯定地回答这个问题.

I wish I could offer some guidance to 3 but without actually having your dataset I don't know if it's possible to answer this with any sort of certainty.

我会说LSTM旨在解决常规RNN中存在的长期依赖问题.这个问题归结为随着观察相关信息的时间到该信息将有用的地步之间的差距越来越大,标准的RNN将很难学习它们之间的关系.考虑基于活动的3天与全年进行预测的股价.

I will say that LSTM's were designed to address what is know as the the long term dependency problem present in regular RNN's. What this problem boils down to is that as the gap between when the relevant information was observed to the point where that information would be useful grows, the standard RNN will have a harder time learning the relationship between them. Think of predicting a stock price based on 3 days of activity vs an entire year.

这导致数字4.如果我宽松地使用类似"一词,并将您的时间范围扩展到50年而不是10年,那么使用LSTM所获得的优势将变得更加明显.尽管我相信经验丰富的人将能够提供更好的答案,但我希望看到它.

This leads into number 4. If I use the term 'resembling' loosely and stretch your time window further out to say 50 years as opposed to 10, the advantages gained from using an LSTM would become more apparent. Although I'm sure that someone more experienced will be able to offer a better answer and I look forward to seeing it.

我发现此页面有助于理解LSTM:

I found this page helpful for understanding LSTM's:

https://colah.github.io/posts/2015- 08-了解-LSTM/

这篇关于用于多元时间序列的Keras递归神经网络的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆