带重置的条件累加 [英] Conditional cumsum with reset
问题描述
我有一个数据框,该数据框已经根据需要进行了排序,但现在我想将其切片"成组.
I have a data frame, the data frame is already sorted as needed, but now I will like to "slice it" in groups.
该组的最大累计值应为10,当累计值>10时,应重置累计总和并重新开始
This groups should have a max cumulative value of 10. When the cumulative value is > 10, it should reset the cumulative sum and start over again
library(dplyr)
id <- sample(1:15)
order <- 1:15
value <- c(4, 5, 7, 3, 8, 1, 2, 5, 3, 6, 2, 6, 3, 1, 4)
df <- data.frame(id, order, value)
df
这是我正在寻找的输出(我是手动"完成的)
This is the output I'm looking for(I did it "manually")
cumsum_10 <- c(4, 9, 7, 10, 8, 9, 2, 7, 10, 6, 8, 6, 9, 10, 4)
group_10 <- c(1, 1, 2, 2, 3, 3, 4, 4, 4, 5, 5, 6, 6, 6, 7)
df1 <- data.frame(df, cumsum_10, group_10)
df1
所以我有两个问题
- 如何创建每次超过上限(在本例中为 10)时重置的累积变量
- 如何对每组进行计数/分组
对于第一部分,我尝试了 group_by 和 cumsum 的一些组合,但没有运气
For the first part I was trying some combinations of group_by and cumsum with no luck
df1 <- df %>% group_by(cumsum(c(False, value < 10)))
我更喜欢管道 (%>%) 解决方案而不是 for 循环
I would prefer a pipe (%>%) solution instead of a for loop
谢谢
推荐答案
我认为这不容易矢量化......至少我不知道如何.
I think this is not easily vectorizable.... at least i do not know how.
您可以通过以下方式手动
:
my_cumsum <- function(x){
grp = integer(length(x))
grp[1] = 1
for(i in 2:length(x)){
if(x[i-1] + x[i] <= 10){
grp[i] = grp[i-1]
x[i] = x[i-1] + x[i]
} else {
grp[i] = grp[i-1] + 1
}
}
data.frame(grp, x)
}
对于您的数据,这给出:
For your data this gives:
> my_cumsum(df$value)
grp x
1 1 4
2 1 9
3 2 7
4 2 10
5 3 8
6 3 9
7 4 2
8 4 7
9 4 10
10 5 6
11 5 8
12 6 6
13 6 9
14 6 10
15 7 4
对于我的反例",这给出了:
Also for my "counter-example" this gives:
> my_cumsum(c(10,6,4))
grp x
1 1 10
2 2 6
3 2 10
正如@Khashaa 指出的,这可以通过 Rcpp
更有效地实现.他链接到这个答案 如何加速或向量化 for 循环? 我觉得这非常有用
As @Khashaa pointed out this can be implementet more efficiently via Rcpp
. He linked to this answer How to speed up or vectorize a for loop? which i find very useful
这篇关于带重置的条件累加的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!