每组过去的窗口大小日期的总和 [英] sum over past window-size dates per group
问题描述
The problem is similar to How do I do a conditional sum which only looks between certain date criteria but slightly different and the answer from that does not fit into current problem. The main difference is that the date column based on each group may not necessarily be complete (i.e., certain date may be missing)
输入:
input <- read.table(text="
2017-04-01 A 1
2017-04-02 B 2
2017-04-02 B 2
2017-04-02 C 2
2017-04-02 A 2
2017-04-03 C 3
2017-04-04 A 4
2017-04-05 B 5
2017-04-06 C 6
2017-04-07 A 7
2017-04-08 B 8
2017-04-09 C 9")
colnames(input) <- c("Date","Group","Score")
规则:对于每个组在每个日期,回溯3个日历日期(包括当前日期).计算总和.
Rule: for each group at each date, looking back 3 calendar dates (include current date). calculate the sum.
预期输出:
Date Group 3DaysSumPerGroup
2017-04-01 A 1 #1 previous two dates are not available. partial is allowed
2017-04-02 A 3 #2+1 both 4-01 and 4-02 are in the range
2017-04-04 A 6 #4+2
2017-04-07 A 7 #7
2017-04-02 B 4 # 2+2 at the same day
2017-04-05 B 5
2017-04-08 B 8
2017-04-02 C 2
2017-04-03 C 5
2017-04-06 C 6
2017-04-09 C 9
我尝试将rollapply与partial = T一起使用,但结果似乎不正确.
I tried to use rollapply with partial=T, but result doesn't seem correct.
input %>%
group_by(Group) %>%
arrange(Date) %>% mutate("3DaysSumPerGroup"=rollapply(data=Score,width=3,align="right",FUN=sum,partial=T,fill=NA,rm.na=T))
推荐答案
这是一个使用新的 non-equi joins 和data.table(v1)中的by = .EACHI
功能的(据说很有效的)解决方案.9.8 +)
Here's a (supposedly efficient) solution using the new non-equi joins and the by = .EACHI
features in data.table (v1.9.8+)
library(data.table) #v1.10.4
## Convert to a proper date class, and add another column in order to define the range
setDT(input)[, c("Date", "Date2") := {
Date = as.IDate(Date)
Date2 = Date - 2L
.(Date, Date2)
}]
## Run a non-equi join against the unique Date/Group combination in input
## Sum the Scores on the fly
## You can ignore the second Date column
input[unique(input, by = c("Date", "Group")), ## This removes the dupes
on = .(Group, Date <= Date, Date >= Date2), ## The join condition
.(Score = sum(Score)), ## sum the scores
keyby = .EACHI] ## Run the sum by each row in unique(input, by = c("Date", "Group"))
# Group Date Date Score
# 1: A 2017-04-01 2017-03-30 1
# 2: A 2017-04-02 2017-03-31 3
# 3: A 2017-04-04 2017-04-02 6
# 4: A 2017-04-07 2017-04-05 7
# 5: B 2017-04-02 2017-03-31 4
# 6: B 2017-04-05 2017-04-03 5
# 7: B 2017-04-08 2017-04-06 8
# 8: C 2017-04-02 2017-03-31 2
# 9: C 2017-04-03 2017-04-01 5
# 10: C 2017-04-06 2017-04-04 6
# 11: C 2017-04-09 2017-04-07 9
这篇关于每组过去的窗口大小日期的总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!