用dplyr汇总多列? [英] Summarizing multiple columns with dplyr?
问题描述
df< - data.frame (a =样本(1:5,10,替换= T),
b =样本(1:5,10,替换= T),
c =样本(1:5,10,替换= T )
d = sample(1:5,10,replace = T),
grp = sample(1:3,10,replace = T))
df%>%group_by grp)%>%summarize(mean(a))
这给了我一个每个组由grp表示。
我的问题是:是否可以一次获取每个组中每列的方法?或者我必须为每列重复 df%>%group_by(grp)%>%summarize(mean(a))
我想要的是像
df%>%group_by(grp) %>%summary(mean(a:d))#mean(a:d)不起作用
dplyr
0.2包含 summarise_each
为此目的: p>
df%>%group_by(grp)%>%summarise_each(funs(mean))
#>来源:本地数据框[3 x 5]
#>
#> grp a b c d
#> (int)(dbl)(dbl)(dbl)(dbl)
#> 1 1 3.000000 2.666667 2.666667 3.333333
#> 2 2 2.666667 2.666667 2.500000 2.833333
#> 3 3 4.000000 1.000000 4.000000 3.000000
或者, purrr
package提供了相同的功能:
df%>%slice_rows(grp)%>%dmap )
#>来源:本地数据框[3 x 5]
#>
#> grp a b c d
#> (int)(dbl)(dbl)(dbl)(dbl)
#> 1 1 3.000000 2.666667 2.666667 3.333333
#> 2 2 2.666667 2.666667 2.500000 2.833333
#> 3 3 4.000000 1.000000 4.000000 3.000000
还不要忘记 data.table
:
setDT(df)[,lapply(.SD,mean),by = grp]
#> grp a b c d
#> 1:3 3.714286 3.714286 2.428571 2.428571
#> 2:1 1.000000 4.000000 5.000000 2.000000
#> 3:2 4.000000 4.500000 3.000000 3.000000
I'm struggling a bit with the dplyr-syntax. I have a data frame with different variables and one grouping variable. Now I want to calculate the mean for each column within each group, using dplyr in R.
df <- data.frame(a=sample(1:5, 10, replace=T),
b=sample(1:5, 10, replace=T),
c=sample(1:5, 10, replace=T),
d=sample(1:5, 10, replace=T),
grp=sample(1:3, 10, replace=T))
df %>% group_by(grp) %>% summarise(mean(a))
This gives me the mean for column "a" for each group indicated by "grp".
My question is: is it possible to get the means for each column within each group at once? Or do I have to repeat df %>% group_by(grp) %>% summarise(mean(a))
for each column?
What I would like to have is something like
df %>% group_by(grp) %>% summarise(mean(a:d)) # "mean(a:d)" does not work
dplyr
0.2 contains summarise_each
for this aim:
df %>% group_by(grp) %>% summarise_each(funs(mean))
#> Source: local data frame [3 x 5]
#>
#> grp a b c d
#> (int) (dbl) (dbl) (dbl) (dbl)
#> 1 1 3.000000 2.666667 2.666667 3.333333
#> 2 2 2.666667 2.666667 2.500000 2.833333
#> 3 3 4.000000 1.000000 4.000000 3.000000
Alternatively, the purrr
package provides the same functionality:
df %>% slice_rows("grp") %>% dmap(mean)
#> Source: local data frame [3 x 5]
#>
#> grp a b c d
#> (int) (dbl) (dbl) (dbl) (dbl)
#> 1 1 3.000000 2.666667 2.666667 3.333333
#> 2 2 2.666667 2.666667 2.500000 2.833333
#> 3 3 4.000000 1.000000 4.000000 3.000000
Also don't forget about data.table
:
setDT(df)[, lapply(.SD, mean), by = grp]
#> grp a b c d
#> 1: 3 3.714286 3.714286 2.428571 2.428571
#> 2: 1 1.000000 4.000000 5.000000 2.000000
#> 3: 2 4.000000 4.500000 3.000000 3.000000
这篇关于用dplyr汇总多列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!