R中每组的条件累积平均值 [英] Conditional cumulative mean for each group in R
问题描述
我有一个数据集,如下所示:
I have a data set that looks like this:
id a b
1 AA 2
1 AB 5
1 AA 1
2 AB 2
2 AB 4
3 AB 4
3 AB 3
3 AA 1
我需要计算每个组中每个记录的累积平均值,不包括 a = ='AA'
,所以示例输出应该是:
I need to calculate the cumulative mean for each record within each group and excluding the case where a == 'AA'
, So sample output should be:
id a b mean
1 AA 2 -
1 AB 5 5
1 AA 1 5
2 AB 2 2
2 AB 4 (4+2)/2
3 AB 4 4
3 AB 3 (4+3)/2
3 AA 1 (4+3)/2
3 AA 4 (4+3)/2
我尝试使用dplyr和cummean获取错误。
I tried to achieve it using dplyr and cummean by getting an error.
df <- df %>%
group_by(id) %>%
mutate(mean = cummean(b[a != 'AA']))
错误:不兼容的大小(123),期望147(组大小)o r 1
Error: incompatible size (123), expecting 147 (the group size) or 1
你可以建议一个更好的方式在R中实现相同吗?
Can you suggest a better way to achieve the same in R ?
推荐答案
这里的诀窍是通过将调整后的 cumsum $ c $来重建
cummean
c>按调整数。作为单行:
The trick here is to reconstruct the cummean
by dividing the adjusted cumsum
by the adjusted count. As a one-liner:
df %>% group_by(id) %>% mutate(cumsum(b * (a != 'AA')) / cumsum(a != 'AA'))
使这个更好一点(乘以 a!='AA'
- 魔术!是我心中的丑陋),取出 a!='AA'
作为列
We can make this a little nicer (the "multiply by a!='AA'
- magic!" is the ugliness in my mind) by taking out the a != 'AA'
as a column
df %>%
group_by(id) %>%
mutate(relevance = 0+(a!='AA'),
mean = cumsum(relevance * b) / cumsum(relevance))
这篇关于R中每组的条件累积平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!