用ddply聚合R中的总和和均值 [英] Aggregate sum and mean in R with ddply

查看:92
本文介绍了用ddply聚合R中的总和和均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据框有两列用作分组键,每组中需要累加17列,而应平均一次.让我在不同于ggplot2diamonds数据帧上对此进行说明.

My data frame has two columns that are used as a grouping key, 17 columns that need to be summed in each group, and one column that should be averaged instead. Let me illustrate this on a different data frame, diamonds from ggplot2.

我知道我可以这样做:

ddply(diamonds, ~cut, summarise, x=sum(x), y=sum(y), z=sum(z), price=mean(price))

但是尽管3列合理,但其中17列是不可接受的.

But while it is reasonable for 3 columns, it is unacceptable for 17 of them.

研究此功能时,我发现了colwise函数,但我想到的最好的是:

When researching this, I found the colwise function, but the best I came up with is this:

cbind(ddply(diamonds, ~cut, colwise(sum, 7:9)), price=ddply(diamonds, ~cut, summarise, mean(price))[,2])

是否有可能进一步改善?我想以一种更简单的方式做到这一点,就像(虚构的命令):

Is there a possibility to improve this even further? I would like to do it in a more straightforward way, something like (imaginary commands):

ddply(diamonds, ~cut, colwise(sum, 7:9), price=mean(price))

或:

ddply(diamonds, ~cut, colwise(sum, 7:9), colwise(mean, ~price))

总结:

  • 我不想像第一个示例中那样显式地键入所有17列,如xyz.
  • 理想情况下,我想只通过一次调用ddply来完成此操作,而不必像第二个示例那样求助于cbind(或类似的函数).
  • I don't want to have to type all 17 columns explicitly, like the first example does with x, y, and z.
  • Ideally, I would like to do it with a single call to ddply, without resorting to cbind (or similar functions), as in the second example.

作为参考,我期望的结果是5行5列:

For reference, the result I expect is 5 rows and 5 columns:

        cut         x         y        z    price
1      Fair  10057.50   9954.07  6412.26 4358.758
2      Good  28645.08  28703.75 17855.42 3928.864
3 Very Good  69359.09  69713.45 43009.52 3981.760
4   Premium  82385.88  81985.82 50297.49 4584.258
5     Ideal 118691.07 118963.24 73304.61 3457.542

推荐答案

使用dplyr的其他解决方案.首先,将两个聚合函数都应用于要聚合的每个变量.在结果变量中,仅选择所需的函数/变量组合.

Antoher solution using dplyr. First you apply both aggregate functions on every variable you want to be aggregated. Of the resulting variables you select only the desired function/variable combination.

library(dplyr)
library(ggplot2)

diamonds %>%
    group_by(cut) %>%
    summarise_each(funs(sum, mean), x:z, price) %>%
    select(cut, matches("[xyz]_sum"), price_mean)

这篇关于用ddply聚合R中的总和和均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆