每组过去的窗口大小日期的总和 [英] sum over past window-size dates per group

查看:65
本文介绍了每组过去的窗口大小日期的总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题类似于

The problem is similar to How do I do a conditional sum which only looks between certain date criteria but slightly different and the answer from that does not fit into current problem. The main difference is that the date column based on each group may not necessarily be complete (i.e., certain date may be missing)

输入:

input <- read.table(text="
2017-04-01     A     1
2017-04-02     B     2
2017-04-02     B     2
2017-04-02     C     2
2017-04-02     A     2
2017-04-03     C     3
2017-04-04     A     4
2017-04-05     B     5
2017-04-06     C     6
2017-04-07     A     7
2017-04-08     B     8
2017-04-09     C     9")
colnames(input) <- c("Date","Group","Score")

规则:对于每个组在每个日期,回溯3个日历日期(包括当前日期).计算总和.

Rule: for each group at each date, looking back 3 calendar dates (include current date). calculate the sum.

预期输出:

    Date Group 3DaysSumPerGroup
    2017-04-01     A                1 #1  previous two dates are not available. partial is allowed
    2017-04-02     A                3 #2+1 both 4-01 and 4-02 are in the range
    2017-04-04     A                6 #4+2
    2017-04-07     A                7 #7
    2017-04-02     B                4 # 2+2 at the same day
    2017-04-05     B                5
    2017-04-08     B                8
    2017-04-02     C                2
    2017-04-03     C                5
    2017-04-06     C                6
    2017-04-09     C                9

我尝试将rollapply与partial = T一起使用,但结果似乎不正确.

I tried to use rollapply with partial=T, but result doesn't seem correct.

 input %>% 
     group_by(Group) %>% 
     arrange(Date) %>% mutate("3DaysSumPerGroup"=rollapply(data=Score,width=3,align="right",FUN=sum,partial=T,fill=NA,rm.na=T))

推荐答案

这是一个使用新的 non-equi joins 和data.table(v1)中的by = .EACHI功能的(据说很有效的)解决方案.9.8 +)

Here's a (supposedly efficient) solution using the new non-equi joins and the by = .EACHI features in data.table (v1.9.8+)

library(data.table) #v1.10.4

## Convert to a proper date class, and add another column in order to define the range
setDT(input)[, c("Date", "Date2") := {
  Date = as.IDate(Date)
  Date2 = Date - 2L
  .(Date, Date2)
}]

## Run a non-equi join against the unique Date/Group combination in input
## Sum the Scores on the fly
## You can ignore the second Date column 

input[unique(input, by = c("Date", "Group")), ## This removes the dupes
      on = .(Group, Date <= Date, Date >= Date2), ## The join condition
      .(Score = sum(Score)), ## sum the scores
      keyby = .EACHI] ## Run the sum by each row in unique(input, by = c("Date", "Group"))

#     Group       Date       Date Score
#  1:     A 2017-04-01 2017-03-30     1
#  2:     A 2017-04-02 2017-03-31     3
#  3:     A 2017-04-04 2017-04-02     6
#  4:     A 2017-04-07 2017-04-05     7
#  5:     B 2017-04-02 2017-03-31     4
#  6:     B 2017-04-05 2017-04-03     5
#  7:     B 2017-04-08 2017-04-06     8
#  8:     C 2017-04-02 2017-03-31     2
#  9:     C 2017-04-03 2017-04-01     5
# 10:     C 2017-04-06 2017-04-04     6
# 11:     C 2017-04-09 2017-04-07     9

这篇关于每组过去的窗口大小日期的总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆