基于R中的条件的窗口中的累积总和(或运行窗口总和) [英] Cumulative sum in a window (or running window sum) based on a condition in R
问题描述
我正在尝试根据条件计算给定窗口的累积总和。我已经看到了解决方案有条件累积总和的线程(为数据帧中的每一行计算R中的条件运行总和)和滚动总和( R中的另一个变量的滚动总和),但我找不到两者。我还看到 data.table
在 R data.table滑动窗口。因此,这个问题对我来说非常具有挑战性。
I am trying to calculate cumulative sum for a given window based on a condition. I have seen threads where the solution does conditional cumulative sum (Calculate a conditional running sum in R for every row in data frame) and rolling sum (Rolling Sum by Another Variable in R), but I couldn't find the two together. I also saw that data.table
doesn't have a rolling window function at R data.table sliding window. So, this problem is very challenging for me.
此外,发布了解决方案由Mike Grahan撰写的关于滚动总和,这超出了我的理解。我正在寻找主要基于速度的基于 data.table
的方法。但是,我愿意接受其他方法,只要它们是可以理解的。
Moreover, the solution posted by Mike Grahan on rolling sum is beyond my comprehension. I am looking for data.table
based method primarily for speed. However, I am open to other methods if they are understandable.
这是我的输入数据:
DFI <- structure(list(FY = c(2011, 2012, 2013, 2015, 2016, 2011, 2011,
2012, 2013, 2014, 2015, 2010, 2016, 2013, 2014, 2015, 2010),
Customer = c(13575, 13575, 13575, 13575, 13575, 13575, 13575,
13575, 13575, 13575, 13575, 13578, 13578, 13578, 13578, 13578,
13578), Product = c("A", "A", "A", "A", "A", "B", "B", "B",
"B", "B", "B", "A", "A", "B", "C", "D", "E"), Rev = c(4,
3, 3, 1, 2, 1, 2, 3, 4, 5, 6, 3, 2, 2, 4, 2, 2)), .Names = c("FY",
"Customer", "Product", "Rev"), row.names = c(NA, 17L), class = "data.frame")
这是我的预期输出:(手动创建;如果出现手动错误,我深表歉意。
Here's my expected output: (Manually created; My apologies if there is a manual error)
DFO <- structure(list(FY = c(2011, 2012, 2013, 2015, 2016, 2011, 2012,
2013, 2014, 2015, 2010, 2016, 2013, 2014, 2015, 2010), Customer = c(13575,
13575, 13575, 13575, 13575, 13575, 13575, 13575, 13575, 13575,
13578, 13578, 13578, 13578, 13578, 13578), Product = c("A", "A",
"A", "A", "A", "B", "B", "B", "B", "B", "A", "A", "B", "C", "D",
"E"), Rev = c(4, 3, 3, 1, 2, 3, 3, 4, 5, 6, 3, 2, 2, 4, 2, 2),
cumsum = c(4, 7, 10, 11, 9, 3, 6, 10, 15, 21, 3, 2, 2, 4,
2, 2)), .Names = c("FY", "Customer", "Product", "Rev", "cumsum"
), row.names = c(NA, 16L), class = "data.frame")
一些关于逻辑的评论:
1)我想在5年期限。理想情况下,我希望这5年是可变的,即我可以在代码的其他位置指定的内容。这样,我可以自由地在以后更改窗口进行分析。
1) I want to find rolling sum in a 5-year period. Ideally, I would like this 5-year period to be variable i.e. something I can specify elsewhere in the code. This way, I have the liberty to vary the window later on for my analysis.
2)Window的结束基于最大年份(即上面示例中的 FY
)。在上面的示例中, DFI
中的最大 FY
为 2016
。因此,对于 2016
中的所有条目,窗口的开始年份为 2016-5 +1 = 2012
。
2) The end of Window is based on the maximum year (i.e. FY
in example above). In above example, the max FY
in DFI
is 2016
. So, starting year of the window would be 2016 - 5 + 1 = 2012
for all entries in 2016
.
3)窗口总和(或运行总和)由 Customer
和特定的计算产品
。
3) The window sum (or running sum) is calculated by Customer
and for a specific Product
.
我尝试了什么:
我想在发布前尝试一下。这是我的代码:
I wanted to try something before posting. Here's my code:
DFI <- data.table::as.data.table(DFI)
#Sort it first
DFI<-DFI[order(Customer,FY),]
#find cumulative sum; remove Rev column; order rows
DFOTest<-DFI[,cumsum := cumsum(Rev),by=.(Customer,Product)][,.SD[which.max(cumsum)],by=.(FY,Customer,Product)][,("Rev"):=NULL][order(Customer,Product,FY)]
此代码计算累计和,但是我无法定义5年窗口,然后再计算运行总和。我有两个问题:
This code calculates the cumulative sum, but I am unable to define 5-year window and then calculate running sum. I have two questions:
问题1)如何计算5年的总和?
Question 1) How do I calculate a 5-year running sum?
问题2)?有人可以解释 Mike在此线程上的方法吗?好像很快。但是,我不太确定那里发生了什么。我确实看到有人要求发表评论,但是我不确定它是否不言自明。
Question 2) Can someone please explain Mike's method on this thread ? It seems to be fast. However, I am not really sure what's going on there. I did see that someone requested some commentary, but I am not sure whether it is self-explanatory.
在此先感谢。我在这个问题上苦苦挣扎了两天。
Thanks in advance. I have been struggling on this problem for two days.
推荐答案
1)rollapply 创建一个 Sum
函数,该函数将 FY
和 Rev
作为2列矩阵(或(如果没有),然后将上一年 k
以内的那几年的收入相加。然后将 DFI
转换为数据表,对具有相同客户/产品/年份的行求和,并运行 rollapplyr
和每个客户/产品组的总和。
1) rollapply Create a Sum
function which takes FY
and Rev
as a 2 column matrix (or if not makes it one) and then sums the revenues for those years within k
of the last year. Then convert DFI
to a data table, sum rows having the same Customer/Product/Year and run rollapplyr
with Sum
for each Customer/Product group.
library(data.table)
library(zoo)
k <- 5
Sum <- function(x) {
x <- matrix(x,, 2)
FY <- x[, 1]
Rev <- x[, 2]
ok <- FY >= tail(FY, 1) - k + 1
sum(Rev[ok])
}
DT <- as.data.table(DFI)
DT <- DT[, list(Rev = sum(Rev)), by = c("Customer", "Product", "FY")]
DT[, cumsum := rollapplyr(.SD, k, Sum, by.column = FALSE, partial = TRUE),
by = c("Customer", "Product"), .SDcols = c("FY", "Rev")]
给予:
> DT
Customer Product FY Rev cumsum
1: 13575 A 2011 4 4
2: 13575 A 2012 3 7
3: 13575 A 2013 3 10
4: 13575 A 2015 1 11
5: 13575 A 2016 2 9
6: 13575 B 2011 3 3
7: 13575 B 2012 3 6
8: 13575 B 2013 4 10
9: 13575 B 2014 5 15
10: 13575 B 2015 6 21
11: 13578 A 2010 3 3
12: 13578 A 2016 2 2
13: 13578 B 2013 2 2
14: 13578 C 2014 4 4
15: 13578 D 2015 2 2
16: 13578 E 2010 2 2
2)仅数据表
具有相同首行的行客户/产品/ FY,然后按客户/产品分组,对于每个FY值 fy
,选择 Rev
FY值介于 fy-k + 1
和 fy
和总和。
First sum rows that have the same Customer/Product/FY and then, grouping by Customer/Product, for each FY value, fy
, pick out the Rev
values whose FY values are between fy-k+1
and fy
and sum.
library(data.table)
k <- 5
DT <- as.data.table(DFI)
DT <- DT[, list(Rev = sum(Rev)), by = c("Customer", "Product", "FY")]
DT[, cumsum := sapply(FY, function(fy) sum(Rev[between(FY, fy-k+1, fy)])),
by = c("Customer", "Product")]
给予:
> DT
Customer Product FY Rev cumsum
1: 13575 A 2011 4 4
2: 13575 A 2012 3 7
3: 13575 A 2013 3 10
4: 13575 A 2015 1 11
5: 13575 A 2016 2 9
6: 13575 B 2011 3 3
7: 13575 B 2012 3 6
8: 13575 B 2013 4 10
9: 13575 B 2014 5 15
10: 13575 B 2015 6 21
11: 13578 A 2010 3 3
12: 13578 A 2016 2 2
13: 13578 B 2013 2 2
14: 13578 C 2014 4 4
15: 13578 D 2015 2 2
16: 13578 E 2010 2 2
这篇关于基于R中的条件的窗口中的累积总和(或运行窗口总和)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!