dplyr返回每个组的全局平均值,而不是每个组的平均值 [英] dplyr returns global mean for each group, instead of each groups mean

查看:323
本文介绍了dplyr返回每个组的全局平均值,而不是每个组的平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人可以在这里解释我在做错什么:

  library(dplyr)
temp< -data。 (a = c(1,2,3,1,2,3,1,2,3),b = c(1,2,3,1,2,3,1,2,3))
temp%>%group_by(temp [,1])%>%summarize(n = n(),mean = mean(temp [,2],na.rm = T))

#A tibble:3×3
`temp [,1]`n意味着
< dbl> < INT> < DBL>
1 1 3 2
2 2 3 2
3 3 3 2

我预计会有这样的手段:

  1 1 
2 2
3 3

而不是平均值似乎是全局平均值(col 2中的所有值除以实例数)= 18/9 = 2



如何获得我的期望值?

解决方案

您的问题是您正在计算 temp [,2] 而不是组中的列( mean(temp [,2],na.rm = T)根本不依赖于上下文)。您需要执行以下操作:

 > temp%>%group_by(temp [,1])%>%summarize(n = n(),mean = mean(b,na.rm = T))
#A tibble:3×3
`temp [,1]`n意味着
< dbl> < INT> < DBL>
1 1 3 1
2 2 3 2
3 3 3 3

此外,在 group_by 中使用列名更常见:

 > temp%>%group_by(b)%>%summaryize(n = n(),mean = mean(b,na.rm = T))
#A tibble:3×3
bn
< dbl> < INT> < DBL>
1 1 3 1
2 2 3 2
3 3 3 3


Can someone explain what I am doing wrong here:

library(dplyr)
temp<-data.frame(a=c(1,2,3,1,2,3,1,2,3),b=c(1,2,3,1,2,3,1,2,3))
temp%>%group_by(temp[,1])%>%summarise(n=n(),mean=mean(temp[,2],na.rm=T))

# A tibble: 3 × 3
  `temp[, 1]`     n  mean
        <dbl> <int> <dbl>
1           1     3     2
2           2     3     2
3           3     3     2

I expected the means to be:

1  1
2  2
3  3

instead the mean seems to be the global mean (all values in col 2 divided by the number of instances) = 18/9=2

How do I get the mean to be what I expected?

解决方案

Your problem is that you are calculating the mean of temp[,2]instead of the column in the group (mean(temp[,2],na.rm=T) does not depend on the context at all). You need to do the following:

> temp %>% group_by(temp[,1]) %>% summarise(n=n(), mean=mean(b, na.rm=T))
# A tibble: 3 × 3
  `temp[, 1]`     n  mean
        <dbl> <int> <dbl>
1           1     3     1
2           2     3     2
3           3     3     3

Furthermore it is more common to use the column name in the group_by as well:

> temp %>% group_by(b) %>% summarise(n=n(), mean=mean(b, na.rm=T))
# A tibble: 3 × 3
      b     n  mean
  <dbl> <int> <dbl>
1     1     3     1
2     2     3     2
3     3     3     3

这篇关于dplyr返回每个组的全局平均值,而不是每个组的平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆