R中每组的条件累积平均值 [英] Conditional cumulative mean for each group in R

查看:117
本文介绍了R中每组的条件累积平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集,如下所示:

I have a data set that looks like this:

id   a   b
1    AA  2
1    AB  5
1    AA  1
2    AB  2
2    AB  4
3    AB  4
3    AB  3
3    AA  1

我需要计算每个组中每个记录的累积平均值,不包括 a = ='AA',所以示例输出应该是:

I need to calculate the cumulative mean for each record within each group and excluding the case where a == 'AA', So sample output should be:

id   a   b  mean
1    AA  2   -
1    AB  5   5
1    AA  1   5
2    AB  2   2
2    AB  4   (4+2)/2
3    AB  4   4
3    AB  3   (4+3)/2
3    AA  1   (4+3)/2
3    AA  4   (4+3)/2

我尝试使用dplyr和cummean获取错误。

I tried to achieve it using dplyr and cummean by getting an error.

df <- df %>%
       group_by(id) %>%
       mutate(mean = cummean(b[a != 'AA']))




错误:不兼容的大小(123),期望147(组大小)o r 1

Error: incompatible size (123), expecting 147 (the group size) or 1

你可以建议一个更好的方式在R中实现相同吗?

Can you suggest a better way to achieve the same in R ?

推荐答案

这里的诀窍是通过将调整后的 cumsum cummean c>按调整数。作为单行:

The trick here is to reconstruct the cummean by dividing the adjusted cumsum by the adjusted count. As a one-liner:

df %>% group_by(id) %>% mutate(cumsum(b * (a != 'AA')) / cumsum(a != 'AA'))

使这个更好一点(乘以 a!='AA' - 魔术!是我心中的丑陋),取出 a!='AA'作为列

We can make this a little nicer (the "multiply by a!='AA' - magic!" is the ugliness in my mind) by taking out the a != 'AA' as a column

df %>%
    group_by(id) %>%
    mutate(relevance = 0+(a!='AA'), 
           mean = cumsum(relevance * b) / cumsum(relevance))

这篇关于R中每组的条件累积平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆