R:按列组应用Holt Winters预测时间序列 [英] R: applying Holt Winters by group of columns to forecast time series
问题描述
我有一个频率为7的时间序列数据,如下所示:
I have a time series data with a frequency = 7 as follows:
combo_1_daily_mini <- read.table(header=TRUE, text="
region_1 region_2 region_3 date incidents
USA CA San Francisco 1/1/15 37
USA CA San Francisco 1/2/15 30
USA CA San Francisco 1/3/15 31
USA CA San Francisco 1/4/15 33
USA CA San Francisco 1/5/15 28
USA CA San Francisco 1/6/15 33
USA CA San Francisco 1/7/15 39
USA PA Pittsburg 1/1/15 38
USA PA Pittsburg 1/2/15 35
USA PA Pittsburg 1/3/15 37
USA PA Pittsburg 1/4/15 33
USA PA Pittsburg 1/5/15 30
USA PA Pittsburg 1/6/15 33
USA PA Pittsburg 1/7/15 25
Greece Macedonia Skopje 1/1/15 29
Greece Macedonia Skopje 1/2/15 37
Greece Macedonia Skopje 1/3/15 28
Greece Macedonia Skopje 1/4/15 38
Greece Macedonia Skopje 1/5/15 27
Greece Macedonia Skopje 1/6/15 38
Greece Macedonia Skopje 1/7/15 39
Italy Trentino Trento 1/1/15 35
Italy Trentino Trento 1/2/15 31
Italy Trentino Trento 1/3/15 34
Italy Trentino Trento 1/4/15 34
Italy Trentino Trento 1/5/15 26
Italy Trentino Trento 1/6/15 33
Italy Trentino Trento 1/7/15 27
", sep = "\t")
dput(trst, control = "all")
structure(list(region_1 = structure(c(3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Greece", "Italy", "USA"), class = "factor"),
region_2 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 4L,
4L, 4L, 4L, 4L), .Label = c("CA", "Macedonia", "PA", "Trentino"
), class = "factor"), region_3 = structure(c(2L, 2L, 2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("Pittsburg",
"San Francisco", "Skopje", "Trento"), class = "factor"),
date = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L,
4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L,
5L, 6L, 7L), .Label = c("1/1/15", "1/2/15", "1/3/15", "1/4/15",
"1/5/15", "1/6/15", "1/7/15"), class = "factor"), incidents = c(37L,
30L, 31L, 33L, 28L, 33L, 39L, 38L, 35L, 37L, 33L, 30L, 33L,
25L, 29L, 37L, 28L, 38L, 27L, 38L, 39L, 35L, 31L, 34L, 34L,
26L, 33L, 27L)), .Names = c("region_1", "region_2", "region_3",
"date", "incidents"), class = "data.frame", row.names = c(NA,
-28L))
region_1,region_2,region_3的每个组都有自己的季节性和趋势.
Each group of region_1,region_2,region_3 has its own a seasonality and trend.
我正试图根据历史数据来预测下一个星期的事件数量.从2015年1月1日到2015年6月30日,我有6个月的历史数据,涉及32个国家/地区.每个国家都有很多region_2和region_3.我总共有32,356个唯一的region_1,region_2,region_3时间序列.
I am trying to forecast the number of incidents for the next one week based on the historic data. I have 6 months of historic data from January 01, 2015 to June 30,2015 for 32 different countries. And each country has many region_2 and region_3. I have a total of 32,356 unique region_1, region_2, region_3 time series.
我有2个问题/问题:
- 问题-我面临的问题是,当我在by()函数中应用Holt Winters时,我收到警告,但我无法理解它们.了解它们的任何帮助都是很有帮助的
以下是我的代码:
ts_fun <- function(x){
ts_y <- ts(x, frequency = 7)
}
hw_fun <- function(x){
ts_y <- ts_fun(x)
ts_h <- HoltWinters(ts_y)
}
combo_1_daily_mini$region_1 <- as.factor(combo_1_daily_mini$region_1)
combo_1_daily_mini$region_2 <- as.factor(combo_1_daily_mini$region_2)
combo_1_daily_mini$region_3 <- as.factor(combo_1_daily_mini$region_3)
combo_1_ts <- by(combo_1_daily_mini,list(combo_1_daily_mini$region_1,
combo_1_daily_mini$region_2,
combo_1_daily_mini$region_3
),ts_fun)
combo_1_hw <- by(combo_1_daily_mini,list(combo_1_daily_mini$region_1,
combo_1_daily_mini$region_2,
combo_1_daily_mini$region_3
),hw_fun)
警告消息:
1: In HoltWinters(ts_y) :
optimization difficulties: ERROR: ABNORMAL_TERMINATION_IN_LNSRCH
2: In HoltWinters(ts_y) :
optimization difficulties: ERROR: ABNORMAL_TERMINATION_IN_LNSRCH
3: In HoltWinters(ts_y) :
optimization difficulties: ERROR: ABNORMAL_TERMINATION_IN_LNSRCH
4: In HoltWinters(ts_y) :
optimization difficulties: ERROR: ABNORMAL_TERMINATION_IN_LNSRCH
-
问题-多列应用函数的方式正确吗?有没有更好的办法?我本质上是在寻找下周根据region_1,region_2,region_3得出的预测数字.我计划为此使用以下代码:
Question - Is the way I am applying the function by multiple columns correct? Is there a better way? I am essentially looking to get next week forecast numbers by region_1, region_2, region_3. For which I am planning to use the following code:
nw_forecast<-预报(combo_1_hw,7)
nw_forecast <- forecast(combo_1_hw,7)
我能够应用霍尔特·温特斯(Holt Winters)函数,并且还可以根据每个region_1,region_2,region_3组合创建时间序列数据进行预测.此方法不可行,因为我的数据集中有32,356个唯一组合.
I am able to apply Holt Winters function and also forecast when I create time series data by each region_1,region_2,region_3 combination. This method is not feasible as there are 32,356 unique combinations in my dataset.
感谢您的帮助 谢谢
推荐答案
您可以看看tsibble
包装和fable
来自Hyndman组的寓言:
You may have a look at the tsibble
package and fable
fable from the Hyndman group:
library(tsibble)
library(fable)
combo_1_daily_mini %>%
mutate(date = lubridate::mdy(date)) %>%
as_tsibble(index = date, key = c('region_1', 'region_2', 'region_3')) -> combo_1_daily_mini
combo_1_daily_mini %>%
model(
ets = ETS(box_cox(incidents, 0.3))) %>%
forecast %>%
autoplot(combo_1_daily_mini)
这篇关于R:按列组应用Holt Winters预测时间序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!