在 R (dplyr) 中重置的条件运行计数(累积和) [英] Conditional running count (cumulative sum) with reset in R (dplyr)

查看:22
本文介绍了在 R (dplyr) 中重置的条件运行计数(累积和)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试计算一个以其他变量为条件的运行计数(即累积总和),并且可以为另一个变量的特定值重置.我在 R 中工作,如果可能的话,我更喜欢基于 dplyr 的解决方案.

I'm trying to calculate a running count (i.e., cumulative sum) that is conditional on other variables and that can reset for particular values on another variable. I'm working in R and would prefer a dplyr-based solution, if possible.

我想根据以下算法为运行计数创建一个变量cumulative:

I'd like to create a variable for the running count, cumulative, based on the following algorithm:

  • 计算 idage
  • 组合内的运行次数(cumulative)
  • 对于每个后续 trial 将运行计数(cumulative)增加 1,其中 accuracy = 0, block = 2condition = 1
  • 将每个 trial 的运行计数(cumulative)重置为 0,其中 accuracy = 1block = 2, 和 condition = 1,并且下一个增量从 1 恢复(不是之前的数字)
  • 对于 block != 2condition != 1 的每个 trial,保留运行计数(cumulative) 为 NA
  • Calculate the running count (cumulative) within combinations of id and age
  • Increment running count (cumulative) by 1 for every subsequent trial where accuracy = 0, block = 2, and condition = 1
  • Reset running count (cumulative) to 0 for each trial where accuracy = 1, block = 2, and condition = 1, and the next increment resumes at 1 (not the previous number)
  • For each trial where block != 2, or condition != 1, leave the running count (cumulative) as NA

这是一个最小的工作示例:

Here's a minimal working example:

mydata <- data.frame(id = c(1,1,1,1,1,1,1,1,1,1,1),
                 age = c(1,1,1,1,1,1,1,1,1,1,2),
                 block = c(1,1,2,2,2,2,2,2,2,2,2),
                 trial = c(1,2,1,2,3,4,5,6,7,8,1),
                 condition = c(1,1,1,1,1,2,1,1,1,1,1),
                 accuracy = c(0,0,0,0,0,0,0,1,0,0,0)
)

id  age block   trial   condition   accuracy
1   1   1       1       1           0
1   1   1       2       1           0
1   1   2       1       1           0
1   1   2       2       1           0
1   1   2       3       1           0
1   1   2       4       2           0
1   1   2       5       1           0
1   1   2       6       1           1
1   1   2       7       1           0
1   1   2       8       1           0
1   2   2       1       1           0

预期的输出是:

id  age block   trial   condition   accuracy    cumulative
1   1   1       1       1           0           NA
1   1   1       2       1           0           NA
1   1   2       1       1           0           1
1   1   2       2       1           0           2
1   1   2       3       1           0           3
1   1   2       4       2           0           NA
1   1   2       5       1           0           4
1   1   2       6       1           1           0
1   1   2       7       1           0           1
1   1   2       8       1           0           2
1   2   2       1       1           0           1

推荐答案

我们可以使用 case_when 来根据我们的条件分配我们需要的值.然后我们添加一个额外的 group_by 条件,使用 cumsumtemp 列 0 时切换值.在最后的 mutate步骤我们暂时将 temp 中的 replace NA 值设置为 0,然后对其进行 cumsum 并放回 NA 再次取值以获得最终输出.

We can use case_when to assign the value which we need based on our conditions. We then add an additional group_by condition using cumsum to switch values when the temp column 0. In the final mutate step we temporarily replace NA values in temp to 0, then take cumsum over it and put back the NA values again to it's place to get the final output.

library(dplyr)

mydata %>%
    group_by(id, age) %>%
    mutate(temp = case_when(accuracy == 0 & block == 2 & condition == 1 ~ 1, 
                            accuracy == 1 & block == 2 & condition == 1 ~ 0, 
                            TRUE ~ NA_real_)) %>%
    ungroup() %>%
    group_by(id, age, group = cumsum(replace(temp == 0, is.na(temp), 0))) %>%
    mutate(cumulative = replace(cumsum(replace(temp, is.na(temp), 0)),
                          is.na(temp), NA)) %>%
    select(-temp, -group)


#    group    id   age block trial condition accuracy cumulative
#   <dbl> <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl>      <dbl>
# 1     0     1     1     1     1         1        0         NA
# 2     0     1     1     1     2         1        0         NA
# 3     0     1     1     2     1         1        0          1
# 4     0     1     1     2     2         1        0          2
# 5     0     1     1     2     3         1        0          3
# 6     0     1     1     2     4         2        0         NA
# 7     0     1     1     2     5         1        0          4
# 8     1     1     1     2     6         1        1          0
# 9     1     1     1     2     7         1        0          1
#10     1     1     1     2     8         1        0          2
#11     1     1     2     2     1         1        0          1

这篇关于在 R (dplyr) 中重置的条件运行计数(累积和)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆