用ddply聚合R中的总和和均值 [英] Aggregate sum and mean in R with ddply
问题描述
我的数据框有两列用作分组键,每组中需要累加17列,而应平均一次.让我在不同于ggplot2
的diamonds
数据帧上对此进行说明.
My data frame has two columns that are used as a grouping key, 17 columns that need to be summed in each group, and one column that should be averaged instead. Let me illustrate this on a different data frame, diamonds
from ggplot2
.
我知道我可以这样做:
ddply(diamonds, ~cut, summarise, x=sum(x), y=sum(y), z=sum(z), price=mean(price))
但是尽管3列合理,但其中17列是不可接受的.
But while it is reasonable for 3 columns, it is unacceptable for 17 of them.
研究此功能时,我发现了colwise
函数,但我想到的最好的是:
When researching this, I found the colwise
function, but the best I came up with is this:
cbind(ddply(diamonds, ~cut, colwise(sum, 7:9)), price=ddply(diamonds, ~cut, summarise, mean(price))[,2])
是否有可能进一步改善?我想以一种更简单的方式做到这一点,就像(虚构的命令):
Is there a possibility to improve this even further? I would like to do it in a more straightforward way, something like (imaginary commands):
ddply(diamonds, ~cut, colwise(sum, 7:9), price=mean(price))
或:
ddply(diamonds, ~cut, colwise(sum, 7:9), colwise(mean, ~price))
总结:
- 我不想像第一个示例中那样显式地键入所有17列,如
x
,y
和z
. - 理想情况下,我想只通过一次调用
ddply
来完成此操作,而不必像第二个示例那样求助于cbind
(或类似的函数).
- I don't want to have to type all 17 columns explicitly, like the first example does with
x
,y
, andz
. - Ideally, I would like to do it with a single call to
ddply
, without resorting tocbind
(or similar functions), as in the second example.
作为参考,我期望的结果是5行5列:
For reference, the result I expect is 5 rows and 5 columns:
cut x y z price
1 Fair 10057.50 9954.07 6412.26 4358.758
2 Good 28645.08 28703.75 17855.42 3928.864
3 Very Good 69359.09 69713.45 43009.52 3981.760
4 Premium 82385.88 81985.82 50297.49 4584.258
5 Ideal 118691.07 118963.24 73304.61 3457.542
推荐答案
使用dplyr
的其他解决方案.首先,将两个聚合函数都应用于要聚合的每个变量.在结果变量中,仅选择所需的函数/变量组合.
Antoher solution using dplyr
. First you apply both aggregate functions on every variable you want to be aggregated. Of the resulting variables you select only the desired function/variable combination.
library(dplyr)
library(ggplot2)
diamonds %>%
group_by(cut) %>%
summarise_each(funs(sum, mean), x:z, price) %>%
select(cut, matches("[xyz]_sum"), price_mean)
这篇关于用ddply聚合R中的总和和均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!