使用 ggplot() 和 bsts() 包从贝叶斯时间序列分析中生成 BSTS 平均绝对百分比误差 (MAPE) 图 [英] Production of a BSTS Mean Absolute Percentage Error (MAPE) Plot from a Bayesian Time Series Analysis with MCMC using ggplot() and bsts() packages

查看:94
本文介绍了使用 ggplot() 和 bsts() 包从贝叶斯时间序列分析中生成 BSTS 平均绝对百分比误差 (MAPE) 图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题:

我有一个名为 FID(见下文)的数据框,其中包含 Year & 两列月份和 Sighting_Frequency(鸟类数量).

数据框包含 2015-2017 之间的 3 年 观察,表示我有 36 个月的数据.我使用 bsts 包 中的 bsts() 函数 对 MCMC 进行了 贝叶斯时间序列分析(请参阅下面的 R 代码)按照下面的教程进行操作.

我想生成一个保持平均绝对百分比误差(MAPE)图,如下图所示,它说明了使用包ggplot().

我在尝试生成 d2 数据帧(请参阅下面的教程和 R 代码)时遇到问题,因为我不断遇到此错误消息:-

data.frame 中的错误(c(10^as.numeric(-colMeans(bsts.model$one.step.prediction.errors[-(1:burn), :参数暗示不同的行数:48、32

我一直在努力找出问题所在.如果有人能帮我解决这个问题,我将不胜感激.

非常感谢.

教程

图表

R 代码:

##################################################################################时间序列模型使用 bsts() 函数###################################################################################打开包进行时间序列分析图书馆(润滑)图书馆(bsts)图书馆(dplyr)图书馆(ggplot2)##创建时间序列对象myts2 <- ts(BSTS_Dataframe$Sightings_Frequency, start=c(2015, 1), end=c(2017, 12), frequency=12)##上传数据到windows()函数x <- 窗口(myts2, start=c(2015, 01), end=c(2017, 12))y <- log(x)### 运行 bsts 模型ss <- AddLocalLinearTrend(list(), y)ss <- AddSeasonal(ss, y, nseasons = 3)# bsts.model <- bsts(y, state.specification = ss, family = "poisson", niter = 2, ping=0, seed=1234)bsts.model <- bsts(y, state.specification = ss, family = "logit", niter = 100, ping = 0, seed = 123)##打开绘图窗口dev.new()##绘制 bsts.model情节(bsts.model)##获取建议的烧伤次数burn<-bsts::SuggestBurn(0.1, bsts.model)##预测p<-predict.bsts(bsts.model,horizo​​n = 12,burn=burn,分位数=c(.25,.975))##实际与预测d2 <- data.frame(# 拟合值和预测c(10^as.numeric(-colMeans(bsts.model$one.step.prediction.errors[-(1:burn),])+y),10^as.numeric(p$mean)),# 实际数据和日期as.numeric(BSTS_Dataframe$Sightings_Frequency),as.Date(时间(BSTS_Dataframe$Sightings_Frequency)))######################################错误信息######################################data.frame 中的错误(c(10^as.numeric(-colMeans(bsts.model$one.step.prediction.errors[-(1:burn),:参数暗示不同的行数:48、32names(d2) <- c(拟合"、实际"、日期")### MAPE(平均绝对百分比误差)MAPE <- dplyr::filter(d2, year(Date)>2017) %>% dplyr::summarise(MAPE=mean(abs(Actual-Fitted)/Actual))### 95% 预测可信区间后验间隔 <- cbind.data.frame(10^as.numeric(p$interval[1,]),10^as.numeric(p$interval[2,]),子集(d2,年(日期)>2017)$日期)名称(postterior.interval)<-c(LL",UL",日期")### 将区间加入预测d3 <- left_join(d2,terior.interval, by=日期")### 绘制实际与预测的图,并在保持期具有可信的间隔ggplot(数据=d3, aes(x=日期)) +geom_line(aes(y=Actual, color = "Actual"), size=1.2) +geom_line(aes(y=Fitted, color = "Fitted"), size=1.2, linetype=2) +theme_bw() + 主题(legend.title = element_blank()) + ylab("") + xlab("") +geom_vline(xintercept=as.numeric(as.Date(2017-12-01")), linetype=2) +geom_ribbon(aes(ymin=LL, ymax=UL), fill=灰色", alpha=0.5) +ggtitle(paste0("BSTS -- Holdout MAPE = ", round(100*MAPE,2), "%")) +主题(axis.text.x=element_text(角度 = -90,hjust = 0))

FID 数据帧

structure(list(Year = structure(1:32, .Label = c(2015-01", 2015-02",2015-03"、2015-04"、2015-05"、2015-08"、2015-09"、2015-10"、2015-11"、2015-12"、2016-01"、2016-02"、2016-03"、2016-04"、2016-05"、2016-07"、2016-08"、2016-09"、2016-10"、2016-11"、2016-12"、2017-01"、2017-02"、2017-03"、2017-04"、2017-05"、2017-07"、2017-08"、2017-09"、2017-10"、2017-11"、2017-12"), class = "factor"), Sightings_Frequency = c(36L, 28L, 39L,46L、5L、22L、10L、15L、8L、33L、33L、29L、31L、23L、8L、9L、40L, 41L, 40L, 30L, 30L, 44L, 37L, 41L, 42L, 20L, 7L, 27L, 35L,27L, 43L, 38L)), class = "data.frame", row.names = c(NA, -32L))

解决方案

########################################################################################带有 mcmc 的贝叶斯结构时间序列模型########################################################################################打开包进行时间序列分析图书馆(润滑)图书馆(bsts)图书馆(dplyr)图书馆(ggplot2)图书馆(ggfortify)####################################################################################时间序列模型使用 bsts() 函数#################################################################################BSTS_Dataframe$Year <- lubridate::ymd(paste0(FID$Year,"-01"))allDates <- seq.Date(分钟(FID$年),最大值(FID$年),月")FID <- FID %>% right_join(data.frame(Year = allDates), by = c(Year")) %>% dplyr::arrange(Year) %>%tidyr::fill(Sightings_Frequency, .direction = "down")##创建时间序列对象myts2 <- ts(FID$Sightings_Frequency, start=c(2015, 1), end=c(2017, 12), frequency=12)##上传数据到windows()函数x <- 窗口(myts2, start=c(2015, 01), end=c(2016, 12))y <- log(x)##为对象ss生成一个列表ss <- 列表()#ss <- AddLocalLinearTrend(list(), y)ss <- AddSeasonal(ss, y, nseasons = 12)ss <- AddLocalLevel(ss, y)# bsts.model <- bsts(y, state.specification = ss, family = "poisson", niter = 2, ping=0, seed=1234)# 如果这些是泊松分布的,则不需要使用 logit,因为它限制了响应# 介于 0-1 之间bsts.model <- bsts(y, state.specification = ss, niter = 100, ping = 0, seed = 123)##打开绘图窗口dev.new()##绘制 bsts.model情节(bsts.model)##获取建议的烧伤次数burn<-bsts::SuggestBurn(0.1, bsts.model)##预测p<-predict.bsts(bsts.model,horizo​​n = 12,burn=burn,分位数=c(.25,.975))p$均值##实际与预测d2 <- data.frame(# 拟合值和预测c(exp(as.numeric(-colMeans(bsts.model$one.step.prediction.errors[-(1:burn),])+y)),exp(as.numeric(p$mean))),# 实际数据和日期as.numeric(FID$Sightings_Frequency),as.Date(FID$Year))names(d2) <- c(拟合"、实际"、日期")### MAPE(平均绝对百分比误差)MAPE <- dplyr::filter(d2, lubridate::year(Date)>=2017) %>%dplyr::summarise(MAPE=mean(abs(Actual-Fitted)/Actual))### 95% 预测可信区间后验间隔 <- cbind.data.frame(exp(as.numeric(p$interval[1,])),exp(as.numeric(p$interval[2,])),尾(d2,12)$日期)名称(postterior.interval)<-c(LL",UL",日期")### 将区间加入预测d3 <- left_join(d2,terior.interval, by=日期")##打开绘图窗口dev.new()### 绘制实际与预测的图,并在保持期具有可信的间隔ggplot(数据=d3, aes(x=日期)) +geom_line(aes(y=Actual, color = "Actual"), size=1.2) +geom_line(aes(y=Fitted, color = "Fitted"), size=1.2, linetype=2) +theme_bw() + 主题(legend.title = element_blank()) + ylab("") + xlab("") +geom_vline(xintercept=as.numeric(as.Date(2017-12-01")), linetype=2) +geom_ribbon(aes(ymin=LL, ymax=UL), fill=灰色", alpha=0.5) +ggtitle(paste0("BSTS -- Holdout MAPE = ", round(100*MAPE,2), "%")) +主题(axis.text.x=element_text(角度 = -90,hjust = 0))

情节

Problem:

I have a data frame called FID (see below) that contains two columns for Year & Month, and Sighting_Frequency (counts of birds).

The data frame contains 3 years of observations between 2015-2017, indicating I have 36 months of data. I have run a Bayesian time series analysis with MCMC using the bsts() function in the bsts package (see the R-code below) by following the tutorial below.

I want to produce a holdout Mean Absolute Percentage Error (MAPE) Plot as seen in the diagram below, which illustrates the actual vs predicted values with credible intervals for the holdout period using the package ggplot().

I am getting stuck when I am attempting to produce the d2 data frame (see the tutorial and R-code below) because I keep on experiencing this error message:-

Error in data.frame(c(10^as.numeric(-colMeans(bsts.model$one.step.prediction.errors[-(1:burn),  : 
  arguments imply differing number of rows: 48, 32

I have been struggling to figure out the problem. If anyone can help me solve this issue, I would be deeply appreciative.

Many thanks in advance.

Tutorial

https://multithreaded.stitchfix.com/blog/2016/04/21/forget-arima/?fbclid=IwAR1q6QD5j6AW21FY2_gqDEq-bwBKDJNtg9alKm3bDytzS51w-dVkDZMdbT4

Diagram

R-code:

################################################################################
##Time Series Model using the bsts() function
##################################################################################

##Open packages for the time series analysis

library(lubridate)
library(bsts)
library(dplyr)
library(ggplot2)

##Create a time series object
myts2 <- ts(BSTS_Dataframe$Sightings_Frequency, start=c(2015, 1), end=c(2017, 12), frequency=12)

##Upload the data into the windows() function
x <- window(myts2, start=c(2015, 01), end=c(2017, 12))
y <- log(x)

### Run the bsts model
ss <- AddLocalLinearTrend(list(), y)
ss <- AddSeasonal(ss, y, nseasons = 3)
# bsts.model <- bsts(y, state.specification = ss, family = "poisson", niter = 2, ping=0, seed=1234)
bsts.model <- bsts(y, state.specification = ss, family = "logit",  niter = 100, ping = 0, seed = 123)

##Open plotting window
dev.new()

##Plot the bsts.model
plot(bsts.model)

##Get a suggested number of burns
burn<-bsts::SuggestBurn(0.1, bsts.model)

##Predict

p<-predict.bsts(bsts.model, horizon = 12, burn=burn, quantiles=c(.25, .975))

##Actual vs predicted

d2 <- data.frame(
  # fitted values and predictions
  c(10^as.numeric(-colMeans(bsts.model$one.step.prediction.errors[-(1:burn),])+y),  
    10^as.numeric(p$mean)),
   # actual data and dates 
     as.numeric(BSTS_Dataframe$Sightings_Frequency),
     as.Date(time(BSTS_Dataframe$Sightings_Frequency)))

 ######################################
 Error message
 ######################################

 Error in data.frame(c(10^as.numeric(-colMeans(bsts.model$one.step.prediction.errors[-(1:burn),  : 
      arguments imply differing number of rows: 48, 32

names(d2) <- c("Fitted", "Actual", "Date")

### MAPE (mean absolute percentage error)
MAPE <- dplyr::filter(d2, year(Date)>2017) %>% dplyr::summarise(MAPE=mean(abs(Actual-Fitted)/Actual))

### 95% forecast credible interval
posterior.interval <- cbind.data.frame(
  10^as.numeric(p$interval[1,]),
  10^as.numeric(p$interval[2,]), 
  subset(d2, year(Date)>2017)$Date)
names(posterior.interval) <- c("LL", "UL", "Date")

### Join intervals to the forecast
d3 <- left_join(d2, posterior.interval, by="Date")

### Plot actual versus predicted with credible intervals for the holdout period
ggplot(data=d3, aes(x=Date)) +
  geom_line(aes(y=Actual, colour = "Actual"), size=1.2) +
  geom_line(aes(y=Fitted, colour = "Fitted"), size=1.2, linetype=2) +
  theme_bw() + theme(legend.title = element_blank()) + ylab("") + xlab("") +
  geom_vline(xintercept=as.numeric(as.Date("2017-12-01")), linetype=2) + 
  geom_ribbon(aes(ymin=LL, ymax=UL), fill="grey", alpha=0.5) +
  ggtitle(paste0("BSTS -- Holdout MAPE = ", round(100*MAPE,2), "%")) +
  theme(axis.text.x=element_text(angle = -90, hjust = 0))

FID Dataframe

structure(list(Year = structure(1:32, .Label = c("2015-01", "2015-02", 
"2015-03", "2015-04", "2015-05", "2015-08", "2015-09", "2015-10", 
"2015-11", "2015-12", "2016-01", "2016-02", "2016-03", "2016-04", 
"2016-05", "2016-07", "2016-08", "2016-09", "2016-10", "2016-11", 
"2016-12", "2017-01", "2017-02", "2017-03", "2017-04", "2017-05", 
"2017-07", "2017-08", "2017-09", "2017-10", "2017-11", "2017-12"
), class = "factor"), Sightings_Frequency = c(36L, 28L, 39L, 
46L, 5L, 22L, 10L, 15L, 8L, 33L, 33L, 29L, 31L, 23L, 8L, 9L, 
40L, 41L, 40L, 30L, 30L, 44L, 37L, 41L, 42L, 20L, 7L, 27L, 35L, 
27L, 43L, 38L)), class = "data.frame", row.names = c(NA, -32L
))

解决方案

#######################################################################################
##A Bayesian Structural Time Series Model with mcmc
#######################################################################################

##Open packages for the time series analysis

library(lubridate)
library(bsts)
library(dplyr)
library(ggplot2)
library(ggfortify)

###################################################################################
##Time Series Model using the bsts() function
##################################################################################

BSTS_Dataframe$Year <- lubridate::ymd(paste0(FID$Year,"-01"))

allDates <- seq.Date(
               min(FID$Year),
               max(FID$Year),
               "month")

FID <- FID %>% right_join(data.frame(Year = allDates), by = c("Year")) %>% dplyr::arrange(Year) %>%
                     tidyr::fill(Sightings_Frequency, .direction = "down")

##Create a time series object
myts2 <- ts(FID$Sightings_Frequency, start=c(2015, 1), end=c(2017, 12), frequency=12)

##Upload the data into the windows() function
x <- window(myts2, start=c(2015, 01), end=c(2016, 12))
y <- log(x)

##Produce a list for the object ss
ss <- list()

#ss <- AddLocalLinearTrend(list(), y)
ss <- AddSeasonal(ss, y, nseasons = 12)
ss <- AddLocalLevel(ss, y)
# bsts.model <- bsts(y, state.specification = ss, family = "poisson", niter = 2, ping=0, seed=1234)
# If these are poisson distributed, no need to use logit because it bounds reponse
# between 0-1
bsts.model <- bsts(y, state.specification = ss,  niter = 100, ping = 0, seed = 123)

##Open plotting window
dev.new()

##Plot the bsts.model
plot(bsts.model)

##Get a suggested number of burns
burn<-bsts::SuggestBurn(0.1, bsts.model)

##Predict

p<-predict.bsts(bsts.model, horizon = 12, burn=burn, quantiles=c(.25, .975))

p$mean

##Actual vs predicted

d2 <- data.frame(
  # fitted values and predictions
  c(exp(as.numeric(-colMeans(bsts.model$one.step.prediction.errors[-(1:burn),])+y)),  
    exp(as.numeric(p$mean))),
  # actual data and dates
  as.numeric(FID$Sightings_Frequency),
  as.Date(FID$Year))

names(d2) <- c("Fitted", "Actual", "Date")

### MAPE (mean absolute percentage error)
MAPE <- dplyr::filter(d2, lubridate::year(Date)>=2017) %>%
  dplyr::summarise(MAPE=mean(abs(Actual-Fitted)/Actual))

### 95% forecast credible interval
posterior.interval <- cbind.data.frame(
              exp(as.numeric(p$interval[1,])),
              exp(as.numeric(p$interval[2,])),
              tail(d2,12)$Date)

names(posterior.interval) <- c("LL", "UL", "Date")

### Join intervals to the forecast
d3 <- left_join(d2, posterior.interval, by="Date")

##Open plotting window
dev.new()

### Plot actual versus predicted with credible intervals for the holdout period
ggplot(data=d3, aes(x=Date)) +
  geom_line(aes(y=Actual, colour = "Actual"), size=1.2) +
  geom_line(aes(y=Fitted, colour = "Fitted"), size=1.2, linetype=2) +
  theme_bw() + theme(legend.title = element_blank()) + ylab("") + xlab("") +
  geom_vline(xintercept=as.numeric(as.Date("2017-12-01")), linetype=2) +
  geom_ribbon(aes(ymin=LL, ymax=UL), fill="grey", alpha=0.5) +
  ggtitle(paste0("BSTS -- Holdout MAPE = ", round(100*MAPE,2), "%")) +
  theme(axis.text.x=element_text(angle = -90, hjust = 0))

Plot

这篇关于使用 ggplot() 和 bsts() 包从贝叶斯时间序列分析中生成 BSTS 平均绝对百分比误差 (MAPE) 图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆