基于R中的条件的窗口中的累积总和(或运行窗口总和) [英] Cumulative sum in a window (or running window sum) based on a condition in R

查看:81
本文介绍了基于R中的条件的窗口中的累积总和(或运行窗口总和)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试根据条件计算给定窗口的累积总和。我已经看到了解决方案有条件累积总和的线程(为数据帧中的每一行计算R中的条件运行总和)和滚动总和( R中的另一个变量的滚动总和),但我找不到两者。我还看到 data.table R data.table滑动窗口。因此,这个问题对我来说非常具有挑战性。

I am trying to calculate cumulative sum for a given window based on a condition. I have seen threads where the solution does conditional cumulative sum (Calculate a conditional running sum in R for every row in data frame) and rolling sum (Rolling Sum by Another Variable in R), but I couldn't find the two together. I also saw that data.table doesn't have a rolling window function at R data.table sliding window. So, this problem is very challenging for me.

此外,发布了解决方案由Mike Grahan撰写的关于滚动总和,这超出了我的理解。我正在寻找主要基于速度的基于 data.table 的方法。但是,我愿意接受其他方法,只要它们是可以理解的。

Moreover, the solution posted by Mike Grahan on rolling sum is beyond my comprehension. I am looking for data.table based method primarily for speed. However, I am open to other methods if they are understandable.

这是我的输入数据:

DFI <- structure(list(FY = c(2011, 2012, 2013, 2015, 2016, 2011, 2011, 
2012, 2013, 2014, 2015, 2010, 2016, 2013, 2014, 2015, 2010), 
    Customer = c(13575, 13575, 13575, 13575, 13575, 13575, 13575, 
    13575, 13575, 13575, 13575, 13578, 13578, 13578, 13578, 13578, 
    13578), Product = c("A", "A", "A", "A", "A", "B", "B", "B", 
    "B", "B", "B", "A", "A", "B", "C", "D", "E"), Rev = c(4, 
    3, 3, 1, 2, 1, 2, 3, 4, 5, 6, 3, 2, 2, 4, 2, 2)), .Names = c("FY", 
"Customer", "Product", "Rev"), row.names = c(NA, 17L), class = "data.frame")

这是我的预期输出:(手动创建;如果出现手动错误,我深表歉意。

Here's my expected output: (Manually created; My apologies if there is a manual error)

DFO <- structure(list(FY = c(2011, 2012, 2013, 2015, 2016, 2011, 2012, 
2013, 2014, 2015, 2010, 2016, 2013, 2014, 2015, 2010), Customer = c(13575, 
13575, 13575, 13575, 13575, 13575, 13575, 13575, 13575, 13575, 
13578, 13578, 13578, 13578, 13578, 13578), Product = c("A", "A", 
"A", "A", "A", "B", "B", "B", "B", "B", "A", "A", "B", "C", "D", 
"E"), Rev = c(4, 3, 3, 1, 2, 3, 3, 4, 5, 6, 3, 2, 2, 4, 2, 2), 
    cumsum = c(4, 7, 10, 11, 9, 3, 6, 10, 15, 21, 3, 2, 2, 4, 
    2, 2)), .Names = c("FY", "Customer", "Product", "Rev", "cumsum"
), row.names = c(NA, 16L), class = "data.frame")

一些关于逻辑的评论:

1)我想在5年期限。理想情况下,我希望这5年是可变的,即我可以在代码的其他位置指定的内容。这样,我可以自由地在以后更改窗口进行分析。

1) I want to find rolling sum in a 5-year period. Ideally, I would like this 5-year period to be variable i.e. something I can specify elsewhere in the code. This way, I have the liberty to vary the window later on for my analysis.

2)Window的结束基于最大年份(即上面示例中的 FY )。在上面的示例中, DFI 中的最大 FY 2016 。因此,对于 2016 中的所有条目,窗口的开始年份为 2016-5 +1 = 2012

2) The end of Window is based on the maximum year (i.e. FY in example above). In above example, the max FY in DFI is 2016. So, starting year of the window would be 2016 - 5 + 1 = 2012 for all entries in 2016.

3)窗口总和(或运行总和)由 Customer 和特定的计算产品

3) The window sum (or running sum) is calculated by Customer and for a specific Product.

我尝试了什么:

我想在发布前尝试一下。这是我的代码:

I wanted to try something before posting. Here's my code:

  DFI <- data.table::as.data.table(DFI)

  #Sort it first
  DFI<-DFI[order(Customer,FY),]

  #find cumulative sum; remove Rev column; order rows
  DFOTest<-DFI[,cumsum := cumsum(Rev),by=.(Customer,Product)][,.SD[which.max(cumsum)],by=.(FY,Customer,Product)][,("Rev"):=NULL][order(Customer,Product,FY)]

此代码计算累计和,但是我无法定义5年窗口,然后再计算运行总和。我有两个问题:

This code calculates the cumulative sum, but I am unable to define 5-year window and then calculate running sum. I have two questions:

问题1)如何计算5年的总和?

Question 1) How do I calculate a 5-year running sum?

问题2)?有人可以解释 Mike在此线程上的方法吗?好像很快。但是,我不太确定那里发生了什么。我确实看到有人要求发表评论,但是我不确定它是否不言自明。

Question 2) Can someone please explain Mike's method on this thread ? It seems to be fast. However, I am not really sure what's going on there. I did see that someone requested some commentary, but I am not sure whether it is self-explanatory.

在此先感谢。我在这个问题上苦苦挣扎了两天。

Thanks in advance. I have been struggling on this problem for two days.

推荐答案

1)rollapply 创建一个 Sum 函数,该函数将 FY Rev 作为2列矩阵(或(如果没有),然后将上一年 k 以内的那几年的收入相加。然后将 DFI 转换为数据表,对具有相同客户/产品/年份的行求和,并运行 rollapplyr 每个客户/产品组的总和。

1) rollapply Create a Sum function which takes FY and Rev as a 2 column matrix (or if not makes it one) and then sums the revenues for those years within k of the last year. Then convert DFI to a data table, sum rows having the same Customer/Product/Year and run rollapplyr with Sum for each Customer/Product group.

library(data.table)
library(zoo)

k <- 5
Sum <- function(x) {
  x <- matrix(x,, 2)
  FY <- x[, 1]
  Rev <- x[, 2]
  ok <- FY >= tail(FY, 1) - k + 1
  sum(Rev[ok])
}
DT <- as.data.table(DFI)
DT <- DT[, list(Rev = sum(Rev)), by = c("Customer", "Product", "FY")]
DT[, cumsum := rollapplyr(.SD, k, Sum, by.column = FALSE, partial = TRUE),
       by = c("Customer", "Product"), .SDcols = c("FY", "Rev")]

给予:

 > DT
    Customer Product   FY Rev cumsum
 1:    13575       A 2011   4      4
 2:    13575       A 2012   3      7
 3:    13575       A 2013   3     10
 4:    13575       A 2015   1     11
 5:    13575       A 2016   2      9
 6:    13575       B 2011   3      3
 7:    13575       B 2012   3      6
 8:    13575       B 2013   4     10
 9:    13575       B 2014   5     15
10:    13575       B 2015   6     21
11:    13578       A 2010   3      3
12:    13578       A 2016   2      2
13:    13578       B 2013   2      2
14:    13578       C 2014   4      4
15:    13578       D 2015   2      2
16:    13578       E 2010   2      2

2)仅数据表

具有相同首行的行客户/产品/ FY,然后按客户/产品分组,对于每个FY值 fy ,选择 Rev FY值介于 fy-k + 1 fy 和总和。

First sum rows that have the same Customer/Product/FY and then, grouping by Customer/Product, for each FY value, fy, pick out the Rev values whose FY values are between fy-k+1 and fy and sum.

library(data.table)

k <- 5
DT <- as.data.table(DFI)
DT <- DT[, list(Rev = sum(Rev)), by = c("Customer", "Product", "FY")]
DT[, cumsum := sapply(FY, function(fy) sum(Rev[between(FY, fy-k+1, fy)])),
       by = c("Customer", "Product")]

给予:

> DT
    Customer Product   FY Rev cumsum
 1:    13575       A 2011   4      4
 2:    13575       A 2012   3      7
 3:    13575       A 2013   3     10
 4:    13575       A 2015   1     11
 5:    13575       A 2016   2      9
 6:    13575       B 2011   3      3
 7:    13575       B 2012   3      6
 8:    13575       B 2013   4     10
 9:    13575       B 2014   5     15
10:    13575       B 2015   6     21
11:    13578       A 2010   3      3
12:    13578       A 2016   2      2
13:    13578       B 2013   2      2
14:    13578       C 2014   4      4
15:    13578       D 2015   2      2
16:    13578       E 2010   2      2

这篇关于基于R中的条件的窗口中的累积总和(或运行窗口总和)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆