为什么RNN总是输出1 [英] Why does RNN always output 1

查看:106
本文介绍了为什么RNN总是输出1的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用递归神经网络(RNN)进行预测,但出于某些奇怪的原因,它总是输出1.这里我用一个玩具示例对此进行解释:

I am using Recurrent Neural Networks (RNN) for forecasting, but for some weird reason, it always outputs 1. Here I explain this with a toy example as:

示例 考虑一个尺寸为(360,5)的矩阵M和一个包含M行总和的向量Y.现在,我想使用RNN从M预测Y.使用rnn R软件包,我将模型训练为

Example Consider a matrix M of dimensions (360, 5), and a vector Y which contains rowsum of M. Now, using RNN, I want to predict Y from M. Using rnn R package, I trained model as

   library(rnn) 
    M <- matrix(c(1:1800),ncol=5,byrow = TRUE) # Matrix (say features) 
    Y <- apply(M,1,sum) # Output equls to row sum of M
    mt <- array(c(M),dim=c(NROW(M),1,NCOL(M))) # matrix formatting as [samples, timesteps, features]
    yt <- array(c(Y),dim=c(NROW(M),1,NCOL(Y))) # formatting
    model <- trainr(X=mt,Y=yt,learningrate=0.5,hidden_dim=10,numepochs=1000) # training

我在训练时观察到的一件奇怪的事情是,历元误差始终为4501.理想情况下,历元误差应随着历元的增加而减小.

One strange thing I observed while training is that epoch error is always 4501. Ideally, epoch error should decrease with the increase in epochs.

接下来,我创建了一个测试数据集,其结构与上面的结构相同:

Next, I created a test dataset with the same structure as above one as:

M2 <- matrix(c(1:15),nrow=3,byrow = TRUE)
mt2 <- array(c(M2),dim=c(NROW(M2),1,NCOL(M2)))
predictr(model,mt2)

有了预测,我总是得到的输出为1. 历时恒定且输出相同的原因是什么?

With prediction, I always get the output as 1. What can be the reason for the constant epoch error and the same output?

@Barker提供的答案无法解决我的问题.为了使其公开,在这里我通过保管箱链接以 traindata testadata 和我的R代码

Answer provided by @Barker does not work on my problem. To make it open, here I share minimalistic data via dropbox links as traindata, testadata, and my R code as.

数据详细信息:功率"列是响应变量,它是温度,湿度和从第1天到第14天的前几天消耗的功率的函数.

Data details: column 'power' is response variable which is a function of temperature, humidity, and power consumed on previous days from day1 to day 14.

normalize_data <- function(x){
  normalized = (x-min(x))/(max(x)-min(x))
  return(normalized)
}

#read test and train data
traindat <- read.csv(file = "train.csv")
testdat <- read.csv(file = "test.csv")
# column "power" is response variable and remaining are predictors
# predictors in  traindata
trainX <- traindat[,1:dim(traindat)[2]-1]
# response of train data
trainY <- traindat$power
# arrange data acc. to RNN as [samples,time steps, features]
tx <- array(as.matrix(trainX), dim=c(NROW(trainX), 1, NCOL(trainX)))
tx <- normalize_data(tx) # normalize data in range of [0,1]
ty <- array(trainY, dim=c(NROW(trainY), 1, NCOL(trainY))) # arrange response acc. to predictors
# train model
model <- trainr(X = tx, Y = ty, learningrate = 0.08, hidden_dim = 6, numepochs = 400)

# predictors in test data
testX <- testdat[,1:dim(testdat)[2]-1]
testX <- normalize_data(testX) # normalize data in range of [0,1]
#testY <- testdat$power
# arrange data acc. to RNN as [samples,time steps, features]
tx2 <- array(as.matrix(testX), dim=c(NROW(testX), 1, NCOL(testX))) # predict
pred <- predictr(model,tx2)
pred

我改变了参数learning rate, hidden_dim, numepochs,但仍然导致0.9或1.

I varied parameters learning rate, hidden_dim, numepochs, but still it either results in 0.9 or 1.

推荐答案

大多数RNN不喜欢均值不恒定的数据.解决这个问题的一种策略是区分数据.要查看其工作原理,让我们使用基础R时间序列co2.这是一个具有良好的季节性和趋势的时间序列,因此我们应该能够对其进行预测.

Most RNNs don't like data that don't have a constant mean. One strategy for dealing with this is differencing the data. To see how this works, lets work with a base R time series co2. This is a time series with a nice smooth seasonality and trend, so we should be able to forecast it.

对于我们的模型,我们的输入矩阵将是使用stl分解创建的co2时间序列的季节性"和趋势".因此,让我们像以前一样制作训练和测试数据并训练模型(请注意,我为运行时减少了numepochs).我将使用过去一年半之前的所有数据进行培训,然后使用过去一年半的时间进行测试:

For our model our input matrix is going to be the "seasonality" and "trend" of the co2 time series, created using the stl decomposition. So lets make our training and testing data as you did before and train the model (note I reduced the numepochs for runtime). I will use all the data up to the last year and a half for training, and then use the last year and a half for testing:

#Create the STL decomposition
sdcomp <- stl(co2, s.window = 7)$time.series[,1:2]

Y <- window(co2, end = c(1996, 6))
M <- window(sdcomp, end = c(1996, 6))
#Taken from OP's code
mt <- array(c(M),dim=c(NROW(M),1,NCOL(M)))
yt <- array(c(Y),dim=c(NROW(M),1,NCOL(Y))) 
model <- trainr(X=mt,Y=yt,learningrate=0.5,hidden_dim=10,numepochs=100)

现在,我们可以对测试数据的最后一年进行预测:

Now we can create our predictions on the last year of testing data:

M2 <- window(sdcomp, start = c(1996,7))
mt2 <- array(c(M2),dim=c(NROW(M2),1,NCOL(M2)))
predictr(model,mt2)

output:
      [,1]
 [1,]    1
 [2,]    1
 [3,]    1
 [4,]    1
 [5,]    1
 [6,]    1
 [7,]    1
 [8,]    1
 [9,]    1
[10,]    1
[11,]    1
[12,]    1
[13,]    1
[14,]    1
[15,]    1
[16,]    1
[17,]    1
[18,]    1

母羊,就像在您的示例中一样,它们全都变成了.现在让我们再试一次,但是这次我们将改变数据.由于我们正尝试将预测推迟一年半,因此我们将18用作差分滞后,因为这些是我们会提前18个月知道的值.

Ewe, it is all ones again, just like in your example. Now lets try this again, but this time we will difference the data. Since we are trying to make our predictions one and a half years out, we will use 18 as our differencing lag as those are the values we would know 18 months ahead of time.

dco2 <- diff(co2, 18)
sdcomp <- stl(dco2, s.window = "periodic")$time.series[,1:2]
plot(dco2)

太好了,现在趋势消失了,因此我们的神经网络应该能够更好地找到模式.让我们再试一次新数据.

Great, the trend is now gone so our neural net should be able to find the pattern better. Lets try again with the new data.

Y <- window(dco2, end = c(1996, 6))
M <- window(sdcomp, end = c(1996, 6))

mt <- array(c(M),dim=c(NROW(M),1,NCOL(M)))
yt <- array(c(Y),dim=c(NROW(M),1,NCOL(Y)))
model <- trainr(X=mt,Y=yt,learningrate=0.5,hidden_dim=10,numepochs=100)

M2 <- window(sdcomp, start = c(1996,7))
mt2 <- array(c(M2),dim=c(NROW(M2),1,NCOL(M2)))
(preds <- predictr(model,mt2))

output:
              [,1]
 [1,] 9.999408e-01
 [2,] 9.478496e-01
 [3,] 6.101828e-08
 [4,] 2.615463e-08
 [5,] 3.144719e-08
 [6,] 1.668084e-06
 [7,] 9.972314e-01
 [8,] 9.999901e-01
 [9,] 9.999916e-01
[10,] 9.999916e-01
[11,] 9.999916e-01
[12,] 9.999915e-01
[13,] 9.999646e-01
[14,] 1.299846e-02
[15,] 3.114577e-08
[16,] 2.432247e-08
[17,] 2.586075e-08
[18,] 1.101596e-07

好,现在有东西了!让我们看看它与试图预测的结果相比,dco2:

Ok, now there is something there! Lets see how it compares to what were were trying to forecast, dco2:

并不理想,但是我们正在寻找数据的一般上-下"模式.现在,您需要做的就是调整学习率,并开始使用所有可爱的超参数进行优化,这些超参数使使用神经网络变得非常高兴.当您按需要工作时,您只需获取最终输出,然后再添加过去18个月的培训数据即可.

Not ideal, but we but it is finding the general "up down" pattern of the data. Now all you have to do is tinker with your learning rates and start optimizing with all those lovely hyper-parameters that make working with neural nets such a joy. When it is working how you want, you can just take your final output and add back in the last 18 months of your training data.

这篇关于为什么RNN总是输出1的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆