使用ggplot2和funggcast函数预测 [英] Forecast with ggplot2 and funggcast function

查看:357
本文介绍了使用ggplot2和funggcast函数预测的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在这个网站上,达文波特先生发表了一个函数,在一个任意数据集的例子中用 ggplot2 绘制arima预测,他发表了< a href =http://davenportspatialanalytics.squarespace.com/blog/2012/3/21/plotting-forecast-objects-in-ggplot-part-2-visualize-observa.html =nofollow noreferrer>这里。我可以按照他的例子,没有任何错误信息。



现在,当我使用我的数据时,我会以警告结束:

  1:在window.default(x,...)中:'end'值未更改
2:在window.default(x,... ):'end'value not changed

我知道当我调用 pd< - funggcast(yt,yfor)由于数据问题,我在我的数据中指出 end = c(2013)。但我不知道如何解决这个问题。



这是我使用的代码:

  library(ggplot2)
library(zoo)
library(预测)
$ b $ myts <-ts(rnorm(55),start = c(1960 ),end = c(2013),freq = 1)
funggcast< - function(dn,fcast){

en < - max(time(fcast $ mean))#提取预测中使用的最大日期

#提取源和培训数据
ds< - as.data.frame(window(dn,end = en))
名称(ds)< - '观察到'
ds $ date< - as.Date(时间(窗口(dn,结束= en)))

#提取拟合值以计算出如何获取置信区间)
dfit < - as.data.frame(fcast $ fitted)
dfit $ date< - as.Date(time(fcast $ fitted))
名称(dfit)[1]< - '拟合'

ds< - 合并(ds,dfit,all.x = T)#合并具有源和训练数据的拟合值

#提取预测值和置信区间
dfcastn< - as.data.frame(fcast)
dfcastn $ da te < - as.Date(as.yearmon(row.names(dfcastn)))
names(dfcastn)<-c('forecast','lo80','hi80','lo95',' hi95','date')

pd< - merge(ds,dfcastn,all.x = T)#用于ggplot的最终data.frame
return(pd)


$ b yt< - window(myts,end = c(2013))#提取训练数据直到去年
yfit< - auto.arima(myts )#fit arima model
yfor< - forecast(yfit)#forecast
pd< - funggcast(yt,yfor)#使用函数fungcast()提取ggplot的数据

ggplot(data = pd,aes(x = date,y = observed))+ geom_line(color =red)+ geom_line(aes(y = fitted),color =blue)+ geom_line(aes(y =预测))+ geom_ribbon(aes(ymin = lo95,ymax = hi95),alpha = .25)+ scale_x_date(name =十年时间)+ scale_y_continuous(name =人均国内生产总值(当前美元)) + theme(axis.text.x = element_text(size = 10),legend.justification = c(0,1),legend.position = c(0,1))+ ggtitle(Arima(0,1,1) G的适合度和预测(1960-2013))+ scale_color_manual(values = c(Blue,Red),breaks = c(Fitted,Data,Forecast))

编辑:我找到另一个博客这里有一个函数用于 forecast ggplot2 但我想使用上面的方法,如果我能找到我的错误。任何人?



Edit2:
如果我使用我的数据
这里,比我在下面看到的图表要好。请注意,我没有为 mtys 更改 end = c(2023),否则它不会将预测与(WDI_gdp_capita $ Brazil,start = c(1960),end = c(2023) ),freq = 1)

funggcast< - 函数(dn,fcast){

en < - max(time(fcast $ mean))#提取最大值(dn,end = en))
名称(ds) < - 'observed'
ds $ date< - as.Date(time(window(dn,end = en)))

#提取拟合值(需要弄清楚如何获取置信区间)
dfit < - as.data.frame(fcast $ fitted)
dfit $ date< - as.Date(time(fcast $ fitted))
名称(dfit)[1]< - '拟合'

ds< - 合并(ds,dfit,all = T)#将合成值与源和训练数据合并

#提取预测值和置信区间
dfcastn < - as.data.frame(fcast)
dfcastn $ date< - as.Date(paste(row.names(dfcastn),01,01,sep = - ))$ (dfcastn)< -c('forecast','lo80','hi80','lo95','hi95','date')

pd< - merge ds,dfcastn,all.x = T)#在ggplot中使用的最终data.frame
返回(pd)

}#ggplot函数由Frank Davenport

yt< - window(myts,end = c(2013))#提取直到去年的训练数据
yfit< - auto.arima(yt)#fit arima模型
yfor< - forecast yfit)#预测
pd < - funggcast(myts,yfor)#使用函数funggcast()提取ggplot的数据

ggplot(data = pd,aes(x = date,y = y观察))+ geom_line(color =red)+ geom_line(aes(y = fitted),color =blue)+ geom_line(aes(y = forecast))+ geom_ribbon(aes(ymin = lo95,ymax = (名称=人均GDP(当前美元))+主题(axis.text.x = element_text(size = 10))。 ,legend.justification = c(0,1),legend.position = c(0,1))+ ggtitle(Arima(0,1,1)巴西人均GDP(1960-2013)的拟合和预测 )+ scale_color_manual(values = c(Blue,Red),breaks = c(Fitted,Data,Forecast))+ ggsave((filename =gdp_forecast_ggplot.pdf),width = 330 ,height = 180,units = c(mm),dpi = 300,limitsize = TRUE)

我得到的几乎完美的图:



另外一个问题:如何在此图表中获得图例

如果将 end = c(2013) for myts ,我得到了和开头一样的图:

解决方案

达文波特先生的分析和你试图创作的情节有几点不同。
第一个是他将arima预测与一些观测数据进行比较,这就是为什么他在整个时间序列的一部分训练集上训练模型。
要做到这一点,你应该让你的初始时间系列更长:

  myts <-ts(rnorm(55 ),start = c(1960),end = c(2023),freq = 1)

然后在脚本结尾处,选择2013年以前的培训:

  yt < -  window(myts,end = c(2013))#提取训练数据直到去年

模型应该在训练集上训练,而不是整个时间系列,所以你应该改变yfit线:

  yfit < -  auto.arima(yt) #fit arima model 

使用整个时间序列调用funggcast函数,因为它需要观察到的数据:

  pd < -  funggcast(myts,yfor)

最后,他使用具有月份和年份的日期,因此在他的 funggcast 函数中,更改以下行:

  dfcastn $ date<  -  as.Date(as.yearmo n(row.names(dfcastn)))

收件人:

  dfcastn $ date<  -  as.Date(paste(row.names(dfcastn),01,01,sep = - ))

这是因为模型预测的值需要更改为日期,如2014年必须更改到2014-01-01,以便与观察到的数据合并。



完成所有更改后,代码如下所示:

  library(ggplot2)
library(zoo)
library(预测)

myts< - ts (rnorm(55),start = c(1960),end = c(2013),freq = 1)
funggcast< - function(dn,fcast){

en< - max(time(fcast $ mean))#提取预测中使用的最大日期

#提取源和训练数据
ds < - as.data.frame(window(dn ,end = en))
names(ds)< - 'observed'
ds $ date< - as.Date(time(window(dn,end = en)))

#提取拟合值(需要以计算出如何获取置信区间)
dfit < - as.data.frame(fcast $ fitted)
dfit $ date< - as.Date(time(fcast $ fitted))
名称(dfit)[1]< - '拟合'

ds< - 合并(ds,dfit,all.x = T)#合并具有源和训练数据的拟合值

#提取预测值和置信区间
dfcastn< - as.data.frame(fcast)
dfcastn $ date< - as.Date(paste(row.names( (dfcastn),01,01,sep = - ))
名称(dfcastn)<-c('forecast','lo80','hi80','lo95','hi95' ,'date')

pd< - 合并(ds,dfcastn,all = T)#用于ggplot的最终data.frame
返回(pd)


$ b $ yt < - window(myts,end = c(2013))#提取训练数据直到去年
yfit < - auto.arima(yt)#fit arima模型
yfor< - 预测(yfit)#预测
pd< - funggcast(myts,yfor)#提取ggplot的数据u sing函数funggcast()

plotData< -ggplot(data = pd,aes(x = date,y = observed))+ geom_line(aes(color =1))+
geom_line(aes(y = fitted,color =2))+
geom_line(aes(y = forecast,color =3))+
scale_colour_manual(values = c(red blue,black),labels = c(Observed,Fitted,Forecasted),name =Data)+
geom_ribbon(aes(ymin = lo95,ymax = hi95), alpha = .25)+
scale_x_date(name =十年时间)+
scale_y_continuous(name =人均GDP(当前美元))+
主题(axis.text巴西(1960-2013)人均国内生产总值的拟合和预测)

($) plotData

你得到一个看起来像这样的情节,拟合非常糟糕,随机时间系列。此外ggplot会输出一些错误,因为预测线在2013年之前没有数据,拟合的数据在2013年之后不会继续。(我运行了几次,取决于初始随机时间序列,模型可能只是预测到处都是0)





编辑:更改 pd 赋值行,以防2013年后没有观察到数据



Edit2:我在代码末尾更改了ggplot函数,以确保图例显示出来

On this website, Mr. Davenport published a function to plot an arima forecast with ggplot2 on the example of an arbitrary dataset, he published here. I can follow his example without any error message.

Now, when I use my data, I would end with the warning:

1: In window.default(x, ...) : 'end' value not changed
2: In window.default(x, ...) : 'end' value not changed

I know that it happens when I call this command pd <- funggcast(yt, yfor) due to an issue with the data I indicate in my data end = c(2013). But I do not know how to fix that.

This is the code I use:

library(ggplot2)
library(zoo)
library(forecast)

myts <- ts(rnorm(55), start = c(1960), end = c(2013), freq = 1)
funggcast <- function(dn, fcast){

en <- max(time(fcast$mean)) # Extract the max date used in the forecast

# Extract Source and Training Data
ds <- as.data.frame(window(dn, end = en))
names(ds) <- 'observed'
ds$date <- as.Date(time(window(dn, end = en)))

# Extract the Fitted Values (need to figure out how to grab confidence intervals)
dfit <- as.data.frame(fcast$fitted)
dfit$date <- as.Date(time(fcast$fitted))
names(dfit)[1] <- 'fitted'

ds <- merge(ds, dfit, all.x = T) # Merge fitted values with source and training data

# Extract the Forecast values and confidence intervals
dfcastn <- as.data.frame(fcast)
dfcastn$date <- as.Date(as.yearmon(row.names(dfcastn)))
names(dfcastn) <- c('forecast','lo80','hi80','lo95','hi95','date')

pd <- merge(ds, dfcastn,all.x = T) # final data.frame for use in ggplot
return(pd)

} 

yt <- window(myts, end = c(2013)) # extract training data until last year
yfit <- auto.arima(myts) # fit arima model
yfor <- forecast(yfit) # forecast
pd <- funggcast(yt, yfor) # extract the data for ggplot using function funggcast()

ggplot(data = pd, aes(x = date,y = observed)) + geom_line(color = "red") + geom_line(aes(y = fitted), color = "blue") + geom_line(aes(y = forecast)) + geom_ribbon(aes(ymin = lo95, ymax = hi95), alpha = .25) + scale_x_date(name = "Time in Decades") + scale_y_continuous(name = "GDP per capita (current US$)") + theme(axis.text.x = element_text(size = 10), legend.justification=c(0,1), legend.position=c(0,1)) + ggtitle("Arima(0,1,1) Fit and Forecast of GDP per capita for Brazil (1960-2013)") + scale_color_manual(values = c("Blue", "Red"), breaks = c("Fitted", "Data", "Forecast"))

Edit: I found another blog here with a function to use with forecast and ggplot2 but I would like to use the approach above, if I were able to find my mistake. Anyone?

Edit2: If I run your updated code with my data here, than I get the graph down below. Note that I did not change the end = c(2023) for mtys, otherwise it would not merge the forecasted with the fitted value.

myts <- ts(WDI_gdp_capita$Brazil, start = c(1960), end = c(2023), freq = 1)

funggcast <- function(dn, fcast){

  en <- max(time(fcast$mean)) # Extract the max date used in the forecast

  # Extract Source and Training Data
  ds <- as.data.frame(window(dn, end = en))
  names(ds) <- 'observed'
  ds$date <- as.Date(time(window(dn, end = en)))

  # Extract the Fitted Values (need to figure out how to grab confidence intervals)
  dfit <- as.data.frame(fcast$fitted)
  dfit$date <- as.Date(time(fcast$fitted))
  names(dfit)[1] <- 'fitted'

  ds <- merge(ds, dfit, all = T) # Merge fitted values with source and training data

  # Extract the Forecast values and confidence intervals
  dfcastn <- as.data.frame(fcast)
  dfcastn$date <- as.Date(paste(row.names(dfcastn),"01","01",sep="-"))
  names(dfcastn) <- c('forecast','lo80','hi80','lo95','hi95','date')

  pd <- merge(ds, dfcastn,all.x = T) # final data.frame for use in ggplot
  return(pd)

} # ggplot function by Frank Davenport

yt <- window(myts, end = c(2013)) # extract training data until last year
yfit <- auto.arima(yt) # fit arima model
yfor <- forecast(yfit) # forecast
pd <- funggcast(myts, yfor) # extract the data for ggplot using function funggcast()

ggplot(data = pd, aes(x = date, y = observed)) + geom_line(color = "red") + geom_line(aes(y = fitted), color = "blue") + geom_line(aes(y = forecast)) + geom_ribbon(aes(ymin = lo95, ymax = hi95), alpha = .25) + scale_x_date(name = "Time in Decades") + scale_y_continuous(name = "GDP per capita (current US$)") + theme(axis.text.x = element_text(size = 10), legend.justification=c(0,1), legend.position=c(0,1)) + ggtitle("Arima(0,1,1) Fit and Forecast of GDP per capita for Brazil (1960-2013)") + scale_color_manual(values = c("Blue", "Red"), breaks = c("Fitted", "Data", "Forecast")) + ggsave((filename = "gdp_forecast_ggplot.pdf"), width=330, height=180, units=c("mm"), dpi = 300, limitsize = TRUE)

The almost perfect graph I get:

One additional question: How can I get a legend in this graph?

If I set end = c(2013) for myts, I get the same graph as in the beginning:

解决方案

There are several points that are different between Mr Davenport's analysis and the plot you are trying to make. The first one is that he is comparing the the arima forecast to some observed data, which is why he trains the model on a portion of the whole time series, the training set. To do this, you should make your initial time series longer:

myts <- ts(rnorm(55), start = c(1960), end = c(2023), freq = 1)

Then at the end of your script, where you select the training up to 2013:

yt <- window(myts, end = c(2013)) # extract training data until last year

The model should be trained on the training set, not the whole time series, so you should change the yfit line to:

yfit <- auto.arima(yt) # fit arima model

And call the funggcast function using the whole time series, because it needs the observed and fitted data:

pd <- funggcast(myts, yfor)

Finally, he uses dates that have month and year, so in his funggcast function, change this line:

dfcastn$date <- as.Date(as.yearmon(row.names(dfcastn)))

To:

dfcastn$date <- as.Date(paste(row.names(dfcastn),"01","01",sep="-"))

This is because the values predicted by the model need to be changed to dates, like 2014 has to be changed to 2014-01-01, in order to be merged with the observed data.

After all the changes, the code looks like this:

library(ggplot2)
library(zoo)
library(forecast)

myts <- ts(rnorm(55), start = c(1960), end = c(2013), freq = 1)
funggcast <- function(dn, fcast){

        en <- max(time(fcast$mean)) # Extract the max date used in the forecast

        # Extract Source and Training Data
        ds <- as.data.frame(window(dn, end = en))
        names(ds) <- 'observed'
        ds$date <- as.Date(time(window(dn, end = en)))

        # Extract the Fitted Values (need to figure out how to grab confidence intervals)
        dfit <- as.data.frame(fcast$fitted)
        dfit$date <- as.Date(time(fcast$fitted))
        names(dfit)[1] <- 'fitted'

        ds <- merge(ds, dfit, all.x = T) # Merge fitted values with source and training data

        # Extract the Forecast values and confidence intervals
        dfcastn <- as.data.frame(fcast)
        dfcastn$date <- as.Date(paste(row.names(dfcastn),"01","01",sep="-"))
        names(dfcastn) <- c('forecast','lo80','hi80','lo95','hi95','date')

        pd <- merge(ds, dfcastn,all= T) # final data.frame for use in ggplot
        return(pd)

} 

yt <- window(myts, end = c(2013)) # extract training data until last year
yfit <- auto.arima(yt) # fit arima model
yfor <- forecast(yfit) # forecast
pd <- funggcast(myts, yfor) # extract the data for ggplot using function funggcast()

plotData<-ggplot(data = pd, aes(x = date, y = observed)) + geom_line(aes(color = "1")) +
        geom_line(aes(y = fitted,color="2")) + 
        geom_line(aes(y = forecast,color="3")) +
        scale_colour_manual(values=c("red", "blue","black"),labels = c("Observed", "Fitted", "Forecasted"),name="Data")+
        geom_ribbon(aes(ymin = lo95, ymax = hi95), alpha = .25)+
        scale_x_date(name = "Time in Decades") +
        scale_y_continuous(name = "GDP per capita (current US$)")+
        theme(axis.text.x = element_text(size = 10)) + 
        ggtitle("Arima(0,1,1) Fit and Forecast of GDP per capita for Brazil (1960-2013)")

plotData

And you get a plot that looks like this, the fitting is pretty bad with a completely random time series. Also ggplot will output some errors because the forecast line has no data before 2013 and the fitted data does not go on after 2013. (I ran it several times, depending on the initial, random time series, the model might just predict 0 everywhere)

Edit: changed the pd assignment line as well, in case there are no observed data after 2013

Edit2: I changed the ggplot function at the end of the code to make sure the legend shows up

这篇关于使用ggplot2和funggcast函数预测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆