在 R (dplyr) 中重置的条件运行计数(累积和) [英] Conditional running count (cumulative sum) with reset in R (dplyr)
问题描述
我正在尝试计算一个以其他变量为条件的运行计数(即累积总和),并且可以为另一个变量的特定值重置.我在 R 中工作,如果可能的话,我更喜欢基于 dplyr
的解决方案.
I'm trying to calculate a running count (i.e., cumulative sum) that is conditional on other variables and that can reset for particular values on another variable. I'm working in R and would prefer a dplyr
-based solution, if possible.
我想根据以下算法为运行计数创建一个变量cumulative
:
I'd like to create a variable for the running count, cumulative
, based on the following algorithm:
- 计算
id
和age
组合内的运行次数( - 对于每个后续
trial
将运行计数(cumulative
)增加 1,其中accuracy = 0
,block = 2
和condition = 1
- 将每个
trial
的运行计数(cumulative
)重置为 0,其中accuracy = 1
,block = 2
, 和condition = 1
,并且下一个增量从 1 恢复(不是之前的数字) - 对于
block != 2
或condition != 1
的每个trial
,保留运行计数(cumulative代码>) 为
NA
cumulative
)- Calculate the running count (
cumulative
) within combinations ofid
andage
- Increment running count (
cumulative
) by 1 for every subsequenttrial
whereaccuracy = 0
,block = 2
, andcondition = 1
- Reset running count (
cumulative
) to 0 for eachtrial
whereaccuracy = 1
,block = 2
, andcondition = 1
, and the next increment resumes at 1 (not the previous number) - For each
trial
whereblock != 2
, orcondition != 1
, leave the running count (cumulative
) asNA
这是一个最小的工作示例:
Here's a minimal working example:
mydata <- data.frame(id = c(1,1,1,1,1,1,1,1,1,1,1),
age = c(1,1,1,1,1,1,1,1,1,1,2),
block = c(1,1,2,2,2,2,2,2,2,2,2),
trial = c(1,2,1,2,3,4,5,6,7,8,1),
condition = c(1,1,1,1,1,2,1,1,1,1,1),
accuracy = c(0,0,0,0,0,0,0,1,0,0,0)
)
id age block trial condition accuracy
1 1 1 1 1 0
1 1 1 2 1 0
1 1 2 1 1 0
1 1 2 2 1 0
1 1 2 3 1 0
1 1 2 4 2 0
1 1 2 5 1 0
1 1 2 6 1 1
1 1 2 7 1 0
1 1 2 8 1 0
1 2 2 1 1 0
预期的输出是:
id age block trial condition accuracy cumulative
1 1 1 1 1 0 NA
1 1 1 2 1 0 NA
1 1 2 1 1 0 1
1 1 2 2 1 0 2
1 1 2 3 1 0 3
1 1 2 4 2 0 NA
1 1 2 5 1 0 4
1 1 2 6 1 1 0
1 1 2 7 1 0 1
1 1 2 8 1 0 2
1 2 2 1 1 0 1
推荐答案
我们可以使用 case_when
来根据我们的条件分配我们需要的值.然后我们添加一个额外的 group_by
条件,使用 cumsum
在 temp
列 0 时切换值.在最后的 mutate
步骤我们暂时将 temp
中的 replace
NA
值设置为 0,然后对其进行 cumsum
并放回 NA
再次取值以获得最终输出.
We can use case_when
to assign the value which we need based on our conditions. We then add an additional group_by
condition using cumsum
to switch values when the temp
column 0. In the final mutate
step we temporarily replace
NA
values in temp
to 0, then take cumsum
over it and put back the NA
values again to it's place to get the final output.
library(dplyr)
mydata %>%
group_by(id, age) %>%
mutate(temp = case_when(accuracy == 0 & block == 2 & condition == 1 ~ 1,
accuracy == 1 & block == 2 & condition == 1 ~ 0,
TRUE ~ NA_real_)) %>%
ungroup() %>%
group_by(id, age, group = cumsum(replace(temp == 0, is.na(temp), 0))) %>%
mutate(cumulative = replace(cumsum(replace(temp, is.na(temp), 0)),
is.na(temp), NA)) %>%
select(-temp, -group)
# group id age block trial condition accuracy cumulative
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 0 1 1 1 1 1 0 NA
# 2 0 1 1 1 2 1 0 NA
# 3 0 1 1 2 1 1 0 1
# 4 0 1 1 2 2 1 0 2
# 5 0 1 1 2 3 1 0 3
# 6 0 1 1 2 4 2 0 NA
# 7 0 1 1 2 5 1 0 4
# 8 1 1 1 2 6 1 1 0
# 9 1 1 1 2 7 1 0 1
#10 1 1 1 2 8 1 0 2
#11 1 1 2 2 1 1 0 1
这篇关于在 R (dplyr) 中重置的条件运行计数(累积和)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!