dplyr / R重置总和 [英] dplyr / R cumulative sum with reset

查看:88
本文介绍了dplyr / R重置总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果当前总和超过某个阈值,我想使用dplyr重新生成累积总和。在下面,我要对'a'求和。

  library(dplyr)
library(tibble)

tib<-tibble(
t = c(1,2,3,4,5,6),
a = c(2,3,1,2,2, 3)


#我想要的东西
##脱粒= 5
#轻咬:6 x 4
#tagc
# < dbl> < dbl> < int> < dbl>
#1 1.00 2.00 0 2.00
#2 2.00 3.00 0 5.00
#3 3.00 1.00 1 1.00
#4 4.00 2.00 1 3.00
#5 5.00 2.00 1 5.00
#6 6.00 3.00 2 3.00

#我想要的东西
##脱粒= 4
#轻咬:6 x 4
#tagc
#< dbl> < dbl> < int> < dbl>
#1 1.00 2.00 0 2.00
#2 2.00 3.00 0 5.00
#3 3.00 1.00 1 1.00
#4 4.00 2.00 1 3.00
#5 5.00 2.00 1 5.00
#6 6.00 3.00 2 3.00

#我想要的东西
##脱粒= 6
#轻咬:6 x 4
#tagc
#< dbl> < dbl> < int> < dbl>
#1 1.00 2.00 0 2.00
#2 2.00 3.00 0 5.00
#3 3.00 1.00 0 6.00
#4 4.00 2.00 1 2.00
#5 5.00 2.00 1 4.00
#6 6.00 3.00 1 7.00

我在这里检查了许多类似的问题(例如如果r中的值变为负值,则重置累积金额)并获得了我希望接近的结果,但没有。



我已经尝试了



<$ p的变体$ p> 阈值< -5
tib%&%;%
group_by(g = cumsum(lag(cumsum(a)> = thresh,default = FALSE))) %>%
突变(c = cumsum(a))%&%;%
ungroup()

返回

 #小标题:6 x 4 
tagc
< dbl> < dbl> < int> < dbl>
1 1.00 2.00 0 2.00
2 2.00 3.00 0 5.00
3 3.00 1.00 1 1.00
4 4.00 2.00 2 2.00
5 5.00 2.00 2.00 3 2.00
6 6.00 3.00 4 3.00

您可以看到组在第一次后没有重置。

解决方案

我认为您可以在此处使用 accumulate()来提供帮助。而且我还做了一个包装函数,可用于不同的阈值

  sum_reset_at<-function(thresh){
函数(x){
累积(x,〜if_else(.x> = thresh,.y,.x + .y))
}
}

tib%>%mutate(c = sum_reset_at(5)(a))
#tac
#< dbl> < dbl> < dbl>
#1 1 2 2
#2 2 3 5
#3 3 1 1
#4 4 2 3
#5 5 2 5
# 6 6 3 3
tib%>%mutate(c = sum_reset_at(4)(a))
#tac
#< dbl> < dbl> < dbl>
#1 1 2 2
#2 2 3 5
#3 3 1 1
#4 4 2 3
#5 5 2 5
# 6 6 3 3
tib%>%mutate(c = sum_reset_at(6)(a))
#tac
#< dbl> < dbl> < dbl>
#1 1 2 2
#2 2 3 5
#3 3 1 6
#4 4 2 2
#5 5 2 4
# 6 6 3 7


I'd like to generate cumulative sums with a reset if the "current" sum exceeds some threshold, using dplyr. In the below, I want to cumsum over 'a'.

library(dplyr)
library(tibble)

tib <- tibble(
  t = c(1,2,3,4,5,6),
  a = c(2,3,1,2,2,3)
)

# what I want
## thresh = 5
# A tibble: 6 x 4
#         t     a     g     c
#      <dbl> <dbl> <int> <dbl>
#   1  1.00  2.00     0  2.00
#   2  2.00  3.00     0  5.00
#   3  3.00  1.00     1  1.00
#   4  4.00  2.00     1  3.00
#   5  5.00  2.00     1  5.00
#   6  6.00  3.00     2  3.00

# what I want
## thresh = 4
# A tibble: 6 x 4
#         t     a     g     c
#      <dbl> <dbl> <int> <dbl>
#   1  1.00  2.00     0  2.00
#   2  2.00  3.00     0  5.00
#   3  3.00  1.00     1  1.00
#   4  4.00  2.00     1  3.00
#   5  5.00  2.00     1  5.00
#   6  6.00  3.00     2  3.00

# what I want
## thresh = 6
# A tibble: 6 x 4
#         t     a     g     c
#      <dbl> <dbl> <int> <dbl>
#   1  1.00  2.00     0  2.00
#   2  2.00  3.00     0  5.00
#   3  3.00  1.00     0  6.00
#   4  4.00  2.00     1  2.00
#   5  5.00  2.00     1  4.00
#   6  6.00  3.00     1  7.00

I've examined many of the similar questions here (such as resetting cumsum if value goes to negative in r) and have gotten what I hoped was close, but no.

I've tried variants of

thresh <-5
tib %>%
  group_by(g = cumsum(lag(cumsum(a) >= thresh, default = FALSE))) %>%
  mutate(c = cumsum(a)) %>%
  ungroup()

which returns

# A tibble: 6 x 4
      t     a     g     c
  <dbl> <dbl> <int> <dbl>
1  1.00  2.00     0  2.00
2  2.00  3.00     0  5.00
3  3.00  1.00     1  1.00
4  4.00  2.00     2  2.00
5  5.00  2.00     3  2.00
6  6.00  3.00     4  3.00

You can see that the "group" is not getting reset after the first time.

解决方案

I think you can use accumulate() here to help. And i've also made a wrapper function to use for different thresholds

sum_reset_at <- function(thresh) {
  function(x) {
    accumulate(x, ~if_else(.x>=thresh, .y, .x+.y))
  }  
}

tib %>% mutate(c = sum_reset_at(5)(a))
#       t     a     c
#   <dbl> <dbl> <dbl>
# 1     1     2     2
# 2     2     3     5
# 3     3     1     1
# 4     4     2     3
# 5     5     2     5
# 6     6     3     3
tib %>% mutate(c = sum_reset_at(4)(a))
#       t     a     c
#   <dbl> <dbl> <dbl>
# 1     1     2     2
# 2     2     3     5
# 3     3     1     1
# 4     4     2     3
# 5     5     2     5
# 6     6     3     3
tib %>% mutate(c = sum_reset_at(6)(a))
#       t     a     c
#   <dbl> <dbl> <dbl>
# 1     1     2     2
# 2     2     3     5
# 3     3     1     6
# 4     4     2     2
# 5     5     2     4
# 6     6     3     7

这篇关于dplyr / R重置总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆