dplyr返回每个组的全局平均值,而不是每个组的平均值 [英] dplyr returns global mean for each group, instead of each groups mean
问题描述
library(dplyr)
temp< -data。 (a = c(1,2,3,1,2,3,1,2,3),b = c(1,2,3,1,2,3,1,2,3))
temp%>%group_by(temp [,1])%>%summarize(n = n(),mean = mean(temp [,2],na.rm = T))
#A tibble:3×3
`temp [,1]`n意味着
< dbl> < INT> < DBL>
1 1 3 2
2 2 3 2
3 3 3 2
我预计会有这样的手段:
1 1
2 2
3 3
而不是平均值似乎是全局平均值(col 2中的所有值除以实例数)= 18/9 = 2
如何获得我的期望值?
您的问题是您正在计算 temp [,2]
而不是组中的列( mean(temp [,2],na.rm = T)
根本不依赖于上下文)。您需要执行以下操作:
> temp%>%group_by(temp [,1])%>%summarize(n = n(),mean = mean(b,na.rm = T))
#A tibble:3×3
`temp [,1]`n意味着
< dbl> < INT> < DBL>
1 1 3 1
2 2 3 2
3 3 3 3
此外,在 group_by
中使用列名更常见:
> temp%>%group_by(b)%>%summaryize(n = n(),mean = mean(b,na.rm = T))
#A tibble:3×3
bn
< dbl> < INT> < DBL>
1 1 3 1
2 2 3 2
3 3 3 3
Can someone explain what I am doing wrong here:
library(dplyr)
temp<-data.frame(a=c(1,2,3,1,2,3,1,2,3),b=c(1,2,3,1,2,3,1,2,3))
temp%>%group_by(temp[,1])%>%summarise(n=n(),mean=mean(temp[,2],na.rm=T))
# A tibble: 3 × 3
`temp[, 1]` n mean
<dbl> <int> <dbl>
1 1 3 2
2 2 3 2
3 3 3 2
I expected the means to be:
1 1
2 2
3 3
instead the mean seems to be the global mean (all values in col 2 divided by the number of instances) = 18/9=2
How do I get the mean to be what I expected?
Your problem is that you are calculating the mean of temp[,2]
instead of the column in the group (mean(temp[,2],na.rm=T)
does not depend on the context at all). You need to do the following:
> temp %>% group_by(temp[,1]) %>% summarise(n=n(), mean=mean(b, na.rm=T))
# A tibble: 3 × 3
`temp[, 1]` n mean
<dbl> <int> <dbl>
1 1 3 1
2 2 3 2
3 3 3 3
Furthermore it is more common to use the column name in the group_by
as well:
> temp %>% group_by(b) %>% summarise(n=n(), mean=mean(b, na.rm=T))
# A tibble: 3 × 3
b n mean
<dbl> <int> <dbl>
1 1 3 1
2 2 3 2
3 3 3 3
这篇关于dplyr返回每个组的全局平均值,而不是每个组的平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!