如何创建列表列表,然后在列表上执行矢量化功能 [英] How to create a list of list and then perform a vectorised function over it

查看:66
本文介绍了如何创建列表列表,然后在列表上执行矢量化功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在此请求中寻找两个特定的帮助点 1)如何在给定我的数据库(all.df)的情况下创建列表列表 2)如何在此列表清单上向量化功能

I'm looking for two specific help point in this request 1) how to create a list of list given my data base (all.df) below 2) how to vectorise a function over this list of list

我正在尝试使用Prophet库在客户/产品级别生成预测. 我正在努力使操作向量化. 我目前正在运行for循环,我希望避免并加快计算速度.

I'm trying to generate a forecast at a customer / product level using the Prophet library. Im struggling to vectorise the operation. I currently run a for loop, which I want to avoid and speed-up my calculations.

set.seed(1123)
df1 <- data.frame(
  Date     = seq(dmy("01/01/2017"), by = "day", length.out = 365*2),

  Customer = "a",
  Product  =  "xxx",
  Revenue  = sample(1:100, 365*2, replace=TRUE))


df2 <- data.frame(
  Date     = seq(dmy("01/01/2017"), by = "day", length.out = 365*2),

  Customer = "a",
  Product  =  "yyy",
  Revenue  = sample(25:200, 365*2, replace=TRUE)) 


df3 <- data.frame(  
  Date     = seq(dmy("01/01/2017"), by = "day", length.out = 365*2),

  Customer = "b",
  Product  =  "xxx",
  Revenue  = sample(1:100, 365*2, replace=TRUE))



df4 <- data.frame(  
  Date     = seq(dmy("01/01/2017"), by = "day", length.out = 365*2),

  Customer = "b",
  Product  =  "yyy",
  Revenue  = sample(25:200, 365*2, replace=TRUE) )

all.df <- rbind(df1, df2, df3, df4)


这是我的预测功能

daily_forecast <- function(df, forecast.days = 365){


# fit actuals into prophet
m <- prophet(df, 
             yearly.seasonality = TRUE,
             weekly.seasonality = TRUE,
             changepoint.prior.scale = 0.55)  # default value is 0.05

# create dummy data frame to hold prodictions
future <- make_future_dataframe(m, periods = forecast.days, freq = "day")

# run the prediction 
forecast <- predict(m, future)

### Select the date and forecast from the model and then merge with actuals
daily_fcast     <- forecast %>% select(ds, yhat) %>% dplyr::rename(Date = ds, fcast.daily = yhat) 
actual.to.merge <- df %>% dplyr::rename(Date = ds, Actual.Revenue = y)
daily_fcast     <- merge(actual.to.merge, daily_fcast, all = TRUE)

}


当前,我使用for循环一次处理一个客户/产品

x <- df1 %>% select(-c(Customer, Product)) %>% 
  dplyr::rename(ds = Date, y = Revenue) %>%
  daily_forecast()

我想将整个操作矢量化:

I would like to instead, vectorise the whole operation:

1-创建列表列表,即,将all.df拆分为:

1-Create a list of list, i.e. split the all.df by:

a)产品,然后

b)由客户

2-然后在上述1)中创建的列表列表上设置了daily_forecast函数映射

2-Then have the daily_forecast function map over the list of list created in 1) above

我非常想使用purrr之外的功能.

I would very much like to use functions out of purrr.

推荐答案

以下是我如何使用purrr来完成您要问的事情:

Here is how I would do what you're asking with purrr:

library(tidyverse)
library(lubridate)
library(prophet)

res <-
  all.df %>% 
  split(.$Customer) %>% 
  map(~ split(.x, .x$Product)) %>% 
  at_depth(2, select, ds = Date, y = Revenue) %>% 
  at_depth(2, daily_forecast)
str(res)
# List of 2
#  $ a:List of 2
# ..$ xxx:'data.frame': 1095 obs. of  3 variables:
# .. ..$ Date          : Date[1:1095], format: "2017-01-01" ...
# .. ..$ Actual.Revenue: int [1:1095] 76 87 87 56 83 17 19 72 92 35 ...
# .. ..$ fcast.daily   : num [1:1095] 55.9 57.9 51.9 51.9 54 ...
# ..$ yyy:'data.frame': 1095 obs. of  3 variables:
# .. ..$ Date          : Date[1:1095], format: "2017-01-01" ...
# .. ..$ Actual.Revenue: int [1:1095] 62 87 175 186 168 190 30 192 119 170 ...
# .. ..$ fcast.daily   : num [1:1095] 121 121 119 119 116 ...
# $ b:List of 2
# ..$ xxx:'data.frame': 1095 obs. of  3 variables:
# .. ..$ Date          : Date[1:1095], format: "2017-01-01" ...
# .. ..$ Actual.Revenue: int [1:1095] 71 94 81 32 85 59 59 55 50 50 ...
# .. ..$ fcast.daily   : num [1:1095] 51.9 54.2 54.5 53.1 51.9 ...
# ..$ yyy:'data.frame': 1095 obs. of  3 variables:
# .. ..$ Date          : Date[1:1095], format: "2017-01-01" ...
# .. ..$ Actual.Revenue: int [1:1095] 105 46 153 136 59 59 34 72 70 85 ...
# .. ..$ fcast.daily   : num [1:1095] 103.3 103.3 103.1 103.1 91.5 ...

但是对我来说,以下内容更自然(将所有内容保存在一个数据帧中):

But the following would be more natural to me (keeping everything in a dataframe):

res_2 <-
  all.df %>% 
  rename(ds = Date, y = Revenue) %>% 
  nest(ds, y) %>% 
  transmute(Customer, Product, res = map(data, daily_forecast)) %>% 
  unnest()
# # A tibble: 4,380 × 5
#    Customer Product       Date Actual.Revenue fcast.daily
#      <fctr>  <fctr>     <date>          <int>       <dbl>
# 1         a     xxx 2017-01-01             76    55.93109
# 2         a     xxx 2017-01-02             87    57.92577
# 3         a     xxx 2017-01-03             87    51.92263
# 4         a     xxx 2017-01-04             56    51.86267
# 5         a     xxx 2017-01-05             83    54.04588
# 6         a     xxx 2017-01-06             17    52.75289
# 7         a     xxx 2017-01-07             19    52.35083
# 8         a     xxx 2017-01-08             72    53.91887
# 9         a     xxx 2017-01-09             92    55.81202
# 10        a     xxx 2017-01-10             35    49.78302
# # ... with 4,370 more rows

这篇关于如何创建列表列表,然后在列表上执行矢量化功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆