dplyr 可以汇总多个变量而不列出每个变量吗? [英] Can dplyr summarise over several variables without listing each one?

查看:17
本文介绍了dplyr 可以汇总多个变量而不列出每个变量吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

dplyr 非常快,但我想知道我是否遗漏了什么:是否可以总结几个变量.例如:

dplyr is amazingly fast, but I wonder if I'm missing something: is it possible summarise over several variables. For example:

library(dplyr)
library(reshape2)

(df=dput(structure(list(sex = structure(c(1L, 1L, 2L, 2L), .Label = c("boy", 
"girl"), class = "factor"), age = c(52L, 58L, 40L, 62L), bmi = c(25L, 
23L, 30L, 26L), chol = c(187L, 220L, 190L, 204L)), .Names = c("sex", 
"age", "bmi", "chol"), row.names = c(NA, -4L), class = "data.frame")))

   sex age bmi chol
1  boy  52  25  187
2  boy  58  23  220
3 girl  40  30  190
4 girl  62  26  204

dg=group_by(df,sex)

有了这个小数据框,写起来很容易

With this small dataframe, it's easy to write

summarise(dg,mean(age),mean(bmi),mean(chol))

而且我知道为了得到我想要的东西,我可以融化,得到手段,然后像

And I know that to get what I want, I could melt, get the means, and then dcast such as

dm=melt(df, id.var='sex')
dmg=group_by(dm, sex, variable); 
x=summarise(dmg, means=mean(value))
dcast(x, sex~variable)

但是如果我有超过 20 个变量和大量行怎么办.是否有类似于 data.table 中的 .SD 的任何内容,可以让我采用分组数据框中所有变量的方法?或者,是否有可能以某种方式在分组数据框上使用 lapply?

But what if I have >20 variables and a very large number of rows. Is there anything similar to .SD in data.table that would allow me to take the means of all variables in the grouped data frame? Or, is it possible to somehow use lapply on the grouped data frame?

感谢您的帮助

推荐答案

data.table 惯用语是 lapply(.SD, mean),也就是

The data.table idiom is lapply(.SD, mean), which is

DT <- data.table(df)
DT[, lapply(.SD, mean), by = sex]
#     sex age bmi  chol
# 1:  boy  55  24 203.5
# 2: girl  51  28 197.0

我不确定同一件事的 dplyr 成语,但你可以做类似的事情

I'm not sure of a dplyr idiom for the same thing, but you can do something like

dg <- group_by(df, sex)
# the names of the columns you want to summarize
cols <- names(dg)[-1]
# the dots component of your call to summarise
dots <- sapply(cols ,function(x) substitute(mean(x), list(x=as.name(x))))
do.call(summarise, c(list(.data=dg), dots))
# Source: local data frame [2 x 4]

#    sex age bmi  chol
# 1  boy  55  24 203.5
# 2 girl  51  28 197.0

请注意,有一个 github 问题 #178 可以有效地实现 plyr 成语 colwisedplyr.

Note that there is a github issue #178 to efficienctly implement the plyr idiom colwise in dplyr.

这篇关于dplyr 可以汇总多个变量而不列出每个变量吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆