用dplyr汇总多列？ [英] Summarizing multiple columns with dplyr?

查看：85 发布时间：2017/7/13 19:59:20 r dplyr

本文介绍了用dplyr汇总多列？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在用dplyr语法挣扎一点。我有一个具有不同变量和一个分组变量的数据框。现在我想使用R中的dplyr来计算每个组中每列的平均值。

  df<  -  data.frame （a =样本（1：5,10，替换= T），
b =样本（1：5，10，替换= T），
c =样本（1：5,10，替换= T ）
d = sample（1：5,10，replace = T），
 grp = sample（1：3，10，replace = T））
 df％>％group_by grp）％>％summarize（mean（a））

这给了我一个每个组由grp表示。

我的问题是：是否可以一次获取每个组中每列的方法？或者我必须为每列重复 df％>％group_by（grp）％>％summarize（mean（a））

我想要的是像

  df％>％group_by（grp） ％>％summary（mean（a：d））＃mean（a：d）不起作用

解决方案

dplyr 0.2包含 summarise_each 为此目的： p>

  df％>％group_by（grp）％>％summarise_each（funs（mean））
＃>来源：本地数据框[3 x 5] 
＃> 
＃> grp a b c d 
＃> （int）（dbl）（dbl）（dbl）（dbl）
＃> 1 1 3.000000 2.666667 2.666667 3.333333 
＃> 2 2 2.666667 2.666667 2.500000 2.833333 
＃> 3 3 4.000000 1.000000 4.000000 3.000000

或者， purrr package提供了相同的功能：

  df％>％slice_rows（grp）％>％dmap ）
＃>来源：本地数据框[3 x 5] 
＃> 
＃> grp a b c d 
＃> （int）（dbl）（dbl）（dbl）（dbl）
＃> 1 1 3.000000 2.666667 2.666667 3.333333 
＃> 2 2 2.666667 2.666667 2.500000 2.833333 
＃> 3 3 4.000000 1.000000 4.000000 3.000000

还不要忘记 data.table ：

  setDT（df）[，lapply（.SD，mean），by = grp] 
＃> grp a b c d 
＃> 1：3 3.714286 3.714286 2.428571 2.428571 
＃> 2：1 1.000000 4.000000 5.000000 2.000000 
＃> 3：2 4.000000 4.500000 3.000000 3.000000

I'm struggling a bit with the dplyr-syntax. I have a data frame with different variables and one grouping variable. Now I want to calculate the mean for each column within each group, using dplyr in R.

df <- data.frame(a=sample(1:5, 10, replace=T), 
             b=sample(1:5, 10, replace=T), 
             c=sample(1:5, 10, replace=T), 
             d=sample(1:5, 10, replace=T), 
             grp=sample(1:3, 10, replace=T))
df %>% group_by(grp) %>% summarise(mean(a))

This gives me the mean for column "a" for each group indicated by "grp".

My question is: is it possible to get the means for each column within each group at once? Or do I have to repeat df %>% group_by(grp) %>% summarise(mean(a)) for each column?

What I would like to have is something like

df %>% group_by(grp) %>% summarise(mean(a:d)) # "mean(a:d)" does not work

解决方案

dplyr 0.2 contains summarise_each for this aim:

df %>% group_by(grp) %>% summarise_each(funs(mean))
#> Source: local data frame [3 x 5]
#> 
#>     grp        a        b        c        d
#>   (int)    (dbl)    (dbl)    (dbl)    (dbl)
#> 1     1 3.000000 2.666667 2.666667 3.333333
#> 2     2 2.666667 2.666667 2.500000 2.833333
#> 3     3 4.000000 1.000000 4.000000 3.000000

Alternatively, the purrr package provides the same functionality:

df %>% slice_rows("grp") %>% dmap(mean)
#> Source: local data frame [3 x 5]
#> 
#>     grp        a        b        c        d
#>   (int)    (dbl)    (dbl)    (dbl)    (dbl)
#> 1     1 3.000000 2.666667 2.666667 3.333333
#> 2     2 2.666667 2.666667 2.500000 2.833333
#> 3     3 4.000000 1.000000 4.000000 3.000000

Also don't forget about data.table:

setDT(df)[, lapply(.SD, mean), by = grp]
#>    grp        a        b        c        d
#> 1:   3 3.714286 3.714286 2.428571 2.428571
#> 2:   1 1.000000 4.000000 5.000000 2.000000
#> 3:   2 4.000000 4.500000 3.000000 3.000000

这篇关于用dplyr汇总多列？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

用dplyr汇总多列？ [英] Summarizing multiple columns with dplyr?

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

用dplyr汇总多列？ [英] Summarizing multiple columns with dplyr?

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭