基于 R 中的条件的窗口中的累积总和(或运行窗口总和) [英] Cumulative sum in a window (or running window sum) based on a condition in R

查看:18
本文介绍了基于 R 中的条件的窗口中的累积总和(或运行窗口总和)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试根据条件计算给定窗口的累积总和.我见过解决方案执行条件累积总和的线程(为数据框中的每一行计算 R 中的条件运行总和)和滚动总和(R 中另一个变量的滚动求和),但我找不到两者在一起.我还在 data.table 没有滚动窗口功能>R data.table 滑动窗口.所以,这个问题对我来说很有挑战性.

I am trying to calculate cumulative sum for a given window based on a condition. I have seen threads where the solution does conditional cumulative sum (Calculate a conditional running sum in R for every row in data frame) and rolling sum (Rolling Sum by Another Variable in R), but I couldn't find the two together. I also saw that data.table doesn't have a rolling window function at R data.table sliding window. So, this problem is very challenging for me.

此外,Mike Grahan 发布的关于滚动总和的解决方案超出了我的理解.我正在寻找基于 data.table 的方法,主要是为了速度.但是,如果可以理解,我对其他方法持开放态度.

Moreover, the solution posted by Mike Grahan on rolling sum is beyond my comprehension. I am looking for data.table based method primarily for speed. However, I am open to other methods if they are understandable.

这是我的输入数据:

DFI <- structure(list(FY = c(2011, 2012, 2013, 2015, 2016, 2011, 2011, 
2012, 2013, 2014, 2015, 2010, 2016, 2013, 2014, 2015, 2010), 
    Customer = c(13575, 13575, 13575, 13575, 13575, 13575, 13575, 
    13575, 13575, 13575, 13575, 13578, 13578, 13578, 13578, 13578, 
    13578), Product = c("A", "A", "A", "A", "A", "B", "B", "B", 
    "B", "B", "B", "A", "A", "B", "C", "D", "E"), Rev = c(4, 
    3, 3, 1, 2, 1, 2, 3, 4, 5, 6, 3, 2, 2, 4, 2, 2)), .Names = c("FY", 
"Customer", "Product", "Rev"), row.names = c(NA, 17L), class = "data.frame")

这是我的预期输出:(手动创建;如果出现手动错误,我深表歉意)

Here's my expected output: (Manually created; My apologies if there is a manual error)

DFO <- structure(list(FY = c(2011, 2012, 2013, 2015, 2016, 2011, 2012, 
2013, 2014, 2015, 2010, 2016, 2013, 2014, 2015, 2010), Customer = c(13575, 
13575, 13575, 13575, 13575, 13575, 13575, 13575, 13575, 13575, 
13578, 13578, 13578, 13578, 13578, 13578), Product = c("A", "A", 
"A", "A", "A", "B", "B", "B", "B", "B", "A", "A", "B", "C", "D", 
"E"), Rev = c(4, 3, 3, 1, 2, 3, 3, 4, 5, 6, 3, 2, 2, 4, 2, 2), 
    cumsum = c(4, 7, 10, 11, 9, 3, 6, 10, 15, 21, 3, 2, 2, 4, 
    2, 2)), .Names = c("FY", "Customer", "Product", "Rev", "cumsum"
), row.names = c(NA, 16L), class = "data.frame")

关于逻辑的一些评论:

1) 我想在 5 年内找到滚动总和.理想情况下,我希望这个 5 年期间是可变的,即我可以在代码的其他地方指定的东西.通过这种方式,我可以自由地稍后更改窗口以进行分析.

1) I want to find rolling sum in a 5-year period. Ideally, I would like this 5-year period to be variable i.e. something I can specify elsewhere in the code. This way, I have the liberty to vary the window later on for my analysis.

2) Window 的结束基于最大年份(即上面示例中的 FY).在上面的例子中,DFI 中的最大 FY2016.因此,对于 2016 中的所有条目,窗口的起始年份将为 2016 - 5 + 1 = 2012.

2) The end of Window is based on the maximum year (i.e. FY in example above). In above example, the max FY in DFI is 2016. So, starting year of the window would be 2016 - 5 + 1 = 2012 for all entries in 2016.

3) 窗口总和(或运行总和)由 Customer 和特定 Product 计算.

3) The window sum (or running sum) is calculated by Customer and for a specific Product.

我的尝试:

我想在发布之前尝试一些东西.这是我的代码:

I wanted to try something before posting. Here's my code:

  DFI <- data.table::as.data.table(DFI)

  #Sort it first
  DFI<-DFI[order(Customer,FY),]

  #find cumulative sum; remove Rev column; order rows
  DFOTest<-DFI[,cumsum := cumsum(Rev),by=.(Customer,Product)][,.SD[which.max(cumsum)],by=.(FY,Customer,Product)][,("Rev"):=NULL][order(Customer,Product,FY)]

此代码计算累积总和,但我无法定义 5 年窗口,然后计算运行总和.我有两个问题:

This code calculates the cumulative sum, but I am unable to define 5-year window and then calculate running sum. I have two questions:

问题 1) 如何计算 5 年的总和?

Question 1) How do I calculate a 5-year running sum?

问题 2) 有人可以解释一下 Mike 在此线程上的方法吗?它似乎很快.但是,我不确定那里发生了什么.我确实看到有人要求提供一些评论,但我不确定它是否不言自明.

Question 2) Can someone please explain Mike's method on this thread ? It seems to be fast. However, I am not really sure what's going on there. I did see that someone requested some commentary, but I am not sure whether it is self-explanatory.

提前致谢.这两天我一直在为这个问题苦苦挣扎.

Thanks in advance. I have been struggling on this problem for two days.

推荐答案

1) rollapply 创建一个 Sum 函数,它接受 FYRev 作为一个 2 列矩阵(或者如果不是一个矩阵),然后将上一年 k 内这些年的收入相加.然后将 DFI 转换为数据表,对具有相同 Customer/Product/Year 的行求和,并为每个 Customer/Product 组运行 rollapplyrSum.

1) rollapply Create a Sum function which takes FY and Rev as a 2 column matrix (or if not makes it one) and then sums the revenues for those years within k of the last year. Then convert DFI to a data table, sum rows having the same Customer/Product/Year and run rollapplyr with Sum for each Customer/Product group.

library(data.table)
library(zoo)

k <- 5
Sum <- function(x) {
  x <- matrix(x,, 2)
  FY <- x[, 1]
  Rev <- x[, 2]
  ok <- FY >= tail(FY, 1) - k + 1
  sum(Rev[ok])
}
DT <- as.data.table(DFI)
DT <- DT[, list(Rev = sum(Rev)), by = c("Customer", "Product", "FY")]
DT[, cumsum := rollapplyr(.SD, k, Sum, by.column = FALSE, partial = TRUE),
       by = c("Customer", "Product"), .SDcols = c("FY", "Rev")]

给予:

 > DT
    Customer Product   FY Rev cumsum
 1:    13575       A 2011   4      4
 2:    13575       A 2012   3      7
 3:    13575       A 2013   3     10
 4:    13575       A 2015   1     11
 5:    13575       A 2016   2      9
 6:    13575       B 2011   3      3
 7:    13575       B 2012   3      6
 8:    13575       B 2013   4     10
 9:    13575       B 2014   5     15
10:    13575       B 2015   6     21
11:    13578       A 2010   3      3
12:    13578       A 2016   2      2
13:    13578       B 2013   2      2
14:    13578       C 2014   4      4
15:    13578       D 2015   2      2
16:    13578       E 2010   2      2

2) 仅限数据表

首先汇总具有相同客户/产品/FY 的行,然后按客户/产品分组,对于每个 FY 值,fy,选择 Rev 值其 FY 值介于 fy-k+1fy 和 sum 之间.

First sum rows that have the same Customer/Product/FY and then, grouping by Customer/Product, for each FY value, fy, pick out the Rev values whose FY values are between fy-k+1 and fy and sum.

library(data.table)

k <- 5
DT <- as.data.table(DFI)
DT <- DT[, list(Rev = sum(Rev)), by = c("Customer", "Product", "FY")]
DT[, cumsum := sapply(FY, function(fy) sum(Rev[between(FY, fy-k+1, fy)])),
       by = c("Customer", "Product")]

给予:

> DT
    Customer Product   FY Rev cumsum
 1:    13575       A 2011   4      4
 2:    13575       A 2012   3      7
 3:    13575       A 2013   3     10
 4:    13575       A 2015   1     11
 5:    13575       A 2016   2      9
 6:    13575       B 2011   3      3
 7:    13575       B 2012   3      6
 8:    13575       B 2013   4     10
 9:    13575       B 2014   5     15
10:    13575       B 2015   6     21
11:    13578       A 2010   3      3
12:    13578       A 2016   2      2
13:    13578       B 2013   2      2
14:    13578       C 2014   4      4
15:    13578       D 2015   2      2
16:    13578       E 2010   2      2

这篇关于基于 R 中的条件的窗口中的累积总和(或运行窗口总和)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆