格式化时间序列数据以使用递归神经网络进行短期预测 [英] Format time-series data for short term forecasting using Recurrent Neural networks

查看:72
本文介绍了格式化时间序列数据以使用递归神经网络进行短期预测的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用递归神经网络(RNN)预测日间功耗.但是,我发现RNN所需的数据格式(样本,时间步长,功能)令人困惑.让我用一个例子解释一下:

I want to forecast day-ahead power consumption using recurrent neural networks (RNN). But, I find the required data format (samples, timesteps, features) for RNN as confusing. Let me explain with an example as:

我在保管箱上有 power_dataset.csv 包含从6月5日到6月18日的耗电量,每分钟10分钟(每天144次观察).现在,使用 rnn 包装,请按照以下步骤操作

I have power_dataset.csv on dropbox, which contains power consumption from 5 June to 18 June at 10 minutely rate (144 observations per day). Now, to check the performance of RNN using rnn R package, I am following these steps

  1. 使用6月5日至16日的数据为6月17日使用的火车模型M
  2. 使用M预测6月18日的使用情况,并从6月6日至17日更新使用情况
  1. train model M for the usage of 17 June by using data from 5-16 June
  2. predict usage of 18 June by using M and updated usage from 6-17 June

我对RNN数据格式的理解是:

My understanding of RNN data format is:

样品:样品或观测值的数量.

Samples: No. of samples or observations.

时间步长:图案重复时的步数.以我为例,一天中有144次观察,因此每连续144次观察构成了时间步.换句话说,它定义了季节性周期.

timesteps: No. of steps when the pattern repeats. In my case, 144 observations occur in a day, so every consecutive 144 observations constitute timesteps. In other words, it defines seasonality period.

功能:功能数量,以我为例,即历史天的消耗时间序列

features: No. of features, which is one in my case, i.e., the consumption time-series of historical days

因此,我的脚本是:

library(rnn)
df <- read.csv("power_dataset.csv")
train <- df[1:2016,] # train set from 5-16 June
test <- df[145:dim(df)[1],] # test set from 6-18 June
# prepare data to train a model
trainX <- train[1:1872,]$power # using only power column now
trainY <- train[1873:dim(train)[1],]$power
# data formatting acc. to rnn as [samples, timesteps, features]
tx <-  array(trainX,dim=c(NROW(trainX),144,1))
ty <-  array(trainY,dim=c(NROW(trainY),144,1))
model <- trainr(X=tx,Y=ty,learningrate = 0.04, hidden_dim = 10, numepochs = 100)

错误输出是:

The sample dimension of X is different from the sample dimension of Y.

由于错误的数据格式而产生该错误.如何正确格式化数据?

The error is generated due to wrong data formatting. How can I format data correctly?

推荐答案

几点:

  1. 首先,您需要在输入X中具有相同数量的样本,而在训练数据中必须具有输出Y,在上述实现中,您需要1872个样本用于X,而144个样本用于Y.此外,您的训练数组tx包含相同的列,该列重复了144次,没有太大意义.

  1. You need to have same # of samples in the input X and output Y in the training data to start with, in the above implementation you are having 1872 samples for X and 144 samples for Y. Moreover, your training array tx contains same column replicated 144 times, which does not make much sense.

我们可以考虑通过以下几种方式训练RNNLSTM模型: 在下图中,Model1尝试捕获10分钟时间间隔内的重复模式,而Model2尝试捕获(过去)天中的重复模式.

We can think of training a RNN or LSTM model in a few following ways: In the figure below Model1 tries to capture recurring patterns across the 10 minute time intervals where Model2 tries to capture the recurring pattern across the (previous) days.

# Model1
window <- 144
train <- df[1:(13*window),]$power
tx <- t(sapply(1:13, function(x) train[((x-1)*window+1):(x*window)]))
ty <- tx[2:13,]
tx <- tx[-nrow(tx),]
tx <-  array(tx,dim=c(NROW(tx),NCOL(tx),1))
ty <-  array(trainY,dim=c(NROW(ty),NCOL(ty),1))
model <- trainr(X=tx,Y=ty,learningrate = 0.01, hidden_dim = 10, numepochs = 100)
test <- sapply(2:13, function(x) train[((x-1)*window+1):(x*window)])
pred  <- predictr(model,X=array(test,dim=c(NROW(test),NCOL(test),1)))

# Model2
window <- 144
train <- df[1:(13*window),]$power
tx <- sapply(1:12, function(x) train[((x-1)*window+1):(x*window)])
ty <- train[(12*window+1):(13*window)]
tx <-  array(tx,dim=c(NROW(tx),NCOL(tx),1))
ty <-  array(trainY,dim=c(NROW(ty),1,1))
model <- trainr(X=tx,Y=ty,learningrate = 0.01, hidden_dim = 10, numepochs = 100, seq_to_seq_unsync=TRUE)
test <- sapply(2:13, function(x) train[((x-1)*window+1):(x*window)])
pred  <- predictr(model,X=array(test,dim=c(NROW(test),NCOL(test),1)))

  1. 与特征大小相比,您的数据太小而无法训练RNN或LSTM.这就是为什么两个训练过的模型都非常差且无法使用的原因.您可以尝试收集更多数据并学习模型,然后将其用于预测.

这篇关于格式化时间序列数据以使用递归神经网络进行短期预测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆