Dplyr分组名称和日期前最近n个事件的滚动平均值 [英] Dplyr groupby name and rolling average from last n events by date

查看:33
本文介绍了Dplyr分组名称和日期前最近n个事件的滚动平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想为一个人(姓名)的最近3个事件创建一个滚动平均值.我有想要使用这3个事件中最新事件的日期.有些人在 DF 中的人数可能少于其他人,这没关系.

I would like to create a rolling average over the last 3 events of a person(name). I have the date which I would like to use the most recent of the 3 events. Some people might be in the DF less than others and that's ok.

创建数据框的代码:

library(dplyr)

# Create DataFrame

df<- data.frame(name=c('CAREY.FAKE','CAREY.FAKE','CAREY.FAKE','CAREY.FAKE','CAREY.FAKE','CAREY.FAKE',
                      'JOHN.SMITH','JOHN.SMITH','JOHN.SMITH','JOHN.SMITH','JOHN.SMITH','JOHN.SMITH',
                      'JEFF.JOHNSON','JEFF.JOHNSON','JEFF.JOHNSON','JEFF.JOHNSON',
                      'SARA.JOHNSON','SARA.JOHNSON','SARA.JOHNSON','SARA.JOHNSON'
                      ),
               GA=c(2,2,2,2,2,20,2,2,2,2,2,20,2,2,2,20,2,2,2,20),
               SV=c(2,2,2,2,2,20,2,2,2,2,2,20,2,2,2,20,2,2,2,20),
               GF=c(2,2,2,2,2,20,2,2,2,2,2,20,2,2,2,20,2,2,2,20),
               SA=c(2,2,2,2,2,20,2,2,2,2,2,20,2,2,2,20,2,2,2,20),
               date=c("10/20/2016","10/19/2016","10/18/2016","10/17/2016","10/16/2016","10/15/2016",
                      "10/20/2016","10/19/2016","10/18/2016","10/17/2016","10/16/2016","10/15/2016",
                      "10/20/2016","10/19/2016","10/18/2016","10/17/2016",
                      "10/20/2016","10/19/2016","10/18/2016","10/17/2016"
                      ),
               stringsAsFactors = FALSE)

DF:

name        GA  SV  GF  SA  date
CAREY.FAKE  2   2   2   2   10/20/2016
CAREY.FAKE  2   2   2   2   10/19/2016
CAREY.FAKE  2   2   2   2   10/18/2016
CAREY.FAKE  2   2   2   2   10/17/2016
CAREY.FAKE  2   2   2   2   10/16/2016
CAREY.FAKE  20  20  20  20  10/15/2016
JOHN.SMITH  2   2   2   2   10/20/2016
JOHN.SMITH  2   2   2   2   10/19/2016
JOHN.SMITH  2   2   2   2   10/18/2016
JOHN.SMITH  2   2   2   2   10/17/2016
JOHN.SMITH  2   2   2   2   10/16/2016
JOHN.SMITH  20  20  20  20  10/15/2016
JEFF.JOHNS  2   2   2   2   10/20/2016
JEFF.JOHNS  2   2   2   2   10/19/2016
JEFF.JOHNS  2   2   2   2   10/18/2016
JEFF.JOHNS  20  20  20  20  10/17/2016
SARA.JOHNS  2   2   2   2   10/20/2016
SARA.JOHNS  2   2   2   2   10/19/2016
SARA.JOHNS  2   2   2   2   10/18/2016
SARA.JOHNS  20  20  20  20  10/17/2016

创建滚动平均值的代码:

df_next <- df %>%
  group_by(name) %>%
  summarise(last_three_mean = mean(tail(GA,SV,GF,SA, 3))

错误:

Error in summarise_impl(.data, dots) : 
  Evaluation error: length(n) == 1L is not TRUE.

所需结果:

name        GA  SV  GF  SA
CAREY.FAKE  2   2   2   2
JEFF.JOHNS  2   2   2   2
JOHN.SMITH  2   2   2   2
SARA.JOHNS  2   2   2   2

推荐答案

我们可以通过'Date' arrange ,然后使用 summarise_at 来获取 mean 按名称"分组后的多列

We can arrange by 'Date' and then use summarise_at to get the mean of multiple columns after grouping by 'name'

library(dplyr)
library(lubridate)
df %>% 
   group_by(name) %>%
   arrange(name, mdy(date)) %>% 
   summarise_at(2:5, funs(mean(tail(., 3))))
   #or select the column by matching the name pattern
   #summarise_at(vars(matches("^[A-Z]{2}$")), funs(mean(tail(., 3))))  
# A tibble: 4 x 5
#  name            GA    SV    GF    SA
#  <chr>        <dbl> <dbl> <dbl> <dbl>
#1 CAREY.FAKE       2     2     2     2
#2 JEFF.JOHNSON     2     2     2     2
#3 JOHN.SMITH       2     2     2     2
#4 SARA.JOHNSON     2     2     2     2


或者另一个选择是利用 top_n 然后执行 summarise_at

df %>% 
   group_by(name) %>%
   top_n(mdy(date), n = 3) %>%
   summarise_at(2:5, mean)

这篇关于Dplyr分组名称和日期前最近n个事件的滚动平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆