在R(dplyr)中重置的条件运行计数(累积总和) [英] Conditional running count (cumulative sum) with reset in R (dplyr)

查看:117
本文介绍了在R(dplyr)中重置的条件运行计数(累积总和)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试计算一个以其他变量为条件的运行计数(即累计总和),该计数可以针对另一个变量的特定值进行重置。我正在R中工作,并且如果可能的话,希望使用基于 dplyr 的解决方案。

I'm trying to calculate a running count (i.e., cumulative sum) that is conditional on other variables and that can reset for particular values on another variable. I'm working in R and would prefer a dplyr-based solution, if possible.

我想要根据以下算法为运行计数创建 cumulative 变量:

I'd like to create a variable for the running count, cumulative, based on the following algorithm:


  • id age 累计) c $ c>

  • 每随后的次试验累计)增加1 $ c>,其中 accuracy = 0 block = 2 condition = 1

  • 每个试用版累积)重置为0 $ c>,其中精度= 1 block = 2 ,并且 condition = 1 ,下一个增量将从1开始恢复(不是先前的数字)

  • 对于每个 trial ,其中 block!= 2 condition!= 1 ,保留运行计数(累积)为 NA

  • Calculate the running count (cumulative) within combinations of id and age
  • Increment running count (cumulative) by 1 for every subsequent trial where accuracy = 0, block = 2, and condition = 1
  • Reset running count (cumulative) to 0 for each trial where accuracy = 1, block = 2, and condition = 1, and the next increment resumes at 1 (not the previous number)
  • For each trial where block != 2, or condition != 1, leave the running count (cumulative) as NA

这是一个最小的工作示例:

Here's a minimal working example:

mydata <- data.frame(id = c(1,1,1,1,1,1,1,1,1,1,1),
                 age = c(1,1,1,1,1,1,1,1,1,1,2),
                 block = c(1,1,2,2,2,2,2,2,2,2,2),
                 trial = c(1,2,1,2,3,4,5,6,7,8,1),
                 condition = c(1,1,1,1,1,2,1,1,1,1,1),
                 accuracy = c(0,0,0,0,0,0,0,1,0,0,0)
)

id  age block   trial   condition   accuracy
1   1   1       1       1           0
1   1   1       2       1           0
1   1   2       1       1           0
1   1   2       2       1           0
1   1   2       3       1           0
1   1   2       4       2           0
1   1   2       5       1           0
1   1   2       6       1           1
1   1   2       7       1           0
1   1   2       8       1           0
1   2   2       1       1           0

预期输出为:

id  age block   trial   condition   accuracy    cumulative
1   1   1       1       1           0           NA
1   1   1       2       1           0           NA
1   1   2       1       1           0           1
1   1   2       2       1           0           2
1   1   2       3       1           0           3
1   1   2       4       2           0           NA
1   1   2       5       1           0           4
1   1   2       6       1           1           0
1   1   2       7       1           0           1
1   1   2       8       1           0           2
1   2   2       1       1           0           1


推荐答案

我们可以在 case_when时使用根据条件分配所需的值。然后,使用 cumsum 添加一个附加的 group_by 条件,以在 temp 列0。在最后的 mutate 步骤中,我们暂时替换 NA temp 中的code>值设置为0,然后在其上取 cumsum 并放回 NA 再次获取值,以获取最终输出。

We can use case_when to assign the value which we need based on our conditions. We then add an additional group_by condition using cumsum to switch values when the temp column 0. In the final mutate step we temporarily replace NA values in temp to 0, then take cumsum over it and put back the NA values again to it's place to get the final output.

library(dplyr)

mydata %>%
    group_by(id, age) %>%
    mutate(temp = case_when(accuracy == 0 & block == 2 & condition == 1 ~ 1, 
                            accuracy == 1 & block == 2 & condition == 1 ~ 0, 
                            TRUE ~ NA_real_)) %>%
    ungroup() %>%
    group_by(id, age, group = cumsum(replace(temp == 0, is.na(temp), 0))) %>%
    mutate(cumulative = replace(cumsum(replace(temp, is.na(temp), 0)),
                          is.na(temp), NA)) %>%
    select(-temp, -group)


#    group    id   age block trial condition accuracy cumulative
#   <dbl> <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl>      <dbl>
# 1     0     1     1     1     1         1        0         NA
# 2     0     1     1     1     2         1        0         NA
# 3     0     1     1     2     1         1        0          1
# 4     0     1     1     2     2         1        0          2
# 5     0     1     1     2     3         1        0          3
# 6     0     1     1     2     4         2        0         NA
# 7     0     1     1     2     5         1        0          4
# 8     1     1     1     2     6         1        1          0
# 9     1     1     1     2     7         1        0          1
#10     1     1     1     2     8         1        0          2
#11     1     1     2     2     1         1        0          1

这篇关于在R(dplyr)中重置的条件运行计数(累积总和)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆