在R(dplyr)中重置的条件运行计数(累积总和) [英] Conditional running count (cumulative sum) with reset in R (dplyr)
问题描述
我正在尝试计算一个以其他变量为条件的运行计数(即累计总和),该计数可以针对另一个变量的特定值进行重置。我正在R中工作,并且如果可能的话,希望使用基于 dplyr
的解决方案。
I'm trying to calculate a running count (i.e., cumulative sum) that is conditional on other variables and that can reset for particular values on another variable. I'm working in R and would prefer a dplyr
-based solution, if possible.
我想要根据以下算法为运行计数创建 cumulative
变量:
I'd like to create a variable for the running count, cumulative
, based on the following algorithm:
- 在
id
和age $的组合中计算运行计数(
累计
) c $ c> - 每随后的
次试验$ c,运行计数(
累计
)增加1 $ c>,其中accuracy = 0
,block = 2
和condition = 1
- 每个
试用版$ c将运行计数(
累积
)重置为0 $ c>,其中精度= 1
,block = 2
,并且condition = 1
,下一个增量将从1开始恢复(不是先前的数字) - 对于每个
trial
,其中block!= 2
或condition!= 1
,保留运行计数(累积
)为NA
- Calculate the running count (
cumulative
) within combinations ofid
andage
- Increment running count (
cumulative
) by 1 for every subsequenttrial
whereaccuracy = 0
,block = 2
, andcondition = 1
- Reset running count (
cumulative
) to 0 for eachtrial
whereaccuracy = 1
,block = 2
, andcondition = 1
, and the next increment resumes at 1 (not the previous number) - For each
trial
whereblock != 2
, orcondition != 1
, leave the running count (cumulative
) asNA
这是一个最小的工作示例:
Here's a minimal working example:
mydata <- data.frame(id = c(1,1,1,1,1,1,1,1,1,1,1),
age = c(1,1,1,1,1,1,1,1,1,1,2),
block = c(1,1,2,2,2,2,2,2,2,2,2),
trial = c(1,2,1,2,3,4,5,6,7,8,1),
condition = c(1,1,1,1,1,2,1,1,1,1,1),
accuracy = c(0,0,0,0,0,0,0,1,0,0,0)
)
id age block trial condition accuracy
1 1 1 1 1 0
1 1 1 2 1 0
1 1 2 1 1 0
1 1 2 2 1 0
1 1 2 3 1 0
1 1 2 4 2 0
1 1 2 5 1 0
1 1 2 6 1 1
1 1 2 7 1 0
1 1 2 8 1 0
1 2 2 1 1 0
预期输出为:
id age block trial condition accuracy cumulative
1 1 1 1 1 0 NA
1 1 1 2 1 0 NA
1 1 2 1 1 0 1
1 1 2 2 1 0 2
1 1 2 3 1 0 3
1 1 2 4 2 0 NA
1 1 2 5 1 0 4
1 1 2 6 1 1 0
1 1 2 7 1 0 1
1 1 2 8 1 0 2
1 2 2 1 1 0 1
推荐答案
我们可以在 case_when时使用
根据条件分配所需的值。然后,使用 cumsum
添加一个附加的 group_by
条件,以在 temp $时切换值。 c $ c>列0。在最后的
mutate
步骤中,我们暂时替换
NA 将
temp
中的code>值设置为0,然后在其上取 cumsum
并放回 NA
再次获取值,以获取最终输出。
We can use case_when
to assign the value which we need based on our conditions. We then add an additional group_by
condition using cumsum
to switch values when the temp
column 0. In the final mutate
step we temporarily replace
NA
values in temp
to 0, then take cumsum
over it and put back the NA
values again to it's place to get the final output.
library(dplyr)
mydata %>%
group_by(id, age) %>%
mutate(temp = case_when(accuracy == 0 & block == 2 & condition == 1 ~ 1,
accuracy == 1 & block == 2 & condition == 1 ~ 0,
TRUE ~ NA_real_)) %>%
ungroup() %>%
group_by(id, age, group = cumsum(replace(temp == 0, is.na(temp), 0))) %>%
mutate(cumulative = replace(cumsum(replace(temp, is.na(temp), 0)),
is.na(temp), NA)) %>%
select(-temp, -group)
# group id age block trial condition accuracy cumulative
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 0 1 1 1 1 1 0 NA
# 2 0 1 1 1 2 1 0 NA
# 3 0 1 1 2 1 1 0 1
# 4 0 1 1 2 2 1 0 2
# 5 0 1 1 2 3 1 0 3
# 6 0 1 1 2 4 2 0 NA
# 7 0 1 1 2 5 1 0 4
# 8 1 1 1 2 6 1 1 0
# 9 1 1 1 2 7 1 0 1
#10 1 1 1 2 8 1 0 2
#11 1 1 2 2 1 1 0 1
这篇关于在R(dplyr)中重置的条件运行计数(累积总和)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!