聚合具有不同功能的多个变量 [英] Aggregate multiple variables with different functions

查看:87
本文介绍了聚合具有不同功能的多个变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说我有下表 DataTable

Cat1    |   Cat2    |   Val1    |   Val2
--------------------------------------------
A       |   A       |   1       |   2
A       |   B       |   3       |   4
B       |   A       |   5       |   6
B       |   B       |   7       |   8
A       |   A       |   2       |   4
A       |   B       |   6       |   8
B       |   A       |   10      |   12
B       |   B       |   14      |   16

我想分别按Cat1和Cat2进行汇总,分别取Val1和Val2的总和和平均值,我该如何实现呢?

Which I wanted to Aggregate by Cat1 and Cat2, taking the Sum and Avg of Val1 and Val2 respectively, how might I acheive this?

Cat1    |   Cat2    | Sum Val1  | Avg Val2
--------------------------------------------
A       |   A       |   3       |   3
A       |   B       |   9       |   6
B       |   A       |   15      |   9
B       |   B       |   21      |   12

我已经实现了具有聚合函数的单变量聚合:

I've achieved single variable aggregation with the aggregate function:

aggregate(
        Val1
    ~   Cat1 + Cat2
    data=DataTable,
    FUNC=sum
)

但是尽管与cbind玩耍,却无法获得我想要的行为。我要学习R的时间是24小时,所以我对这些概念还不够熟悉,无法完全理解我一直在做的事情(总是很危险!),但是认为这必须很简单。
|

but despite playing around with cbind, can't get the behaviour I want. I'm 24 hrs into learning R, so I'm not familiar enough with the concepts to fully understand what I've been doing (always dangerous!) but think this must be simple to achieve. |

推荐答案

set.seed(45)
df <- data.frame(c1=rep(c("A","A","B","B"), 2), 
                 c2 = rep(c("A","B"), 4), 
                 v1 = sample(8), 
                 v2 = sample(1:100, 8))
> df
#   c1 c2 v1 v2
# 1  A  A  6 19
# 2  A  B  3  1
# 3  B  A  2 37
# 4  B  B  8 86
# 5  A  A  5 30
# 6  A  B  1 44
# 7  B  A  7 41
# 8  B  B  4 39

v1 <- aggregate( v1 ~ c1 + c2, data = df, sum)
v2 <- aggregate( v2 ~ c1 + c2, data = df, mean)
out <- merge(v1, v2, by=c("c1","c2"))
> out
#   c1 c2 v1   v2
# 1  A  A 11 24.5
# 2  A  B  4 22.5
# 3  B  A  9 39.0
# 4  B  B 12 62.5

**编辑:** 我建议您使用 data.table 可以使事情变得非常简单:

**** I'd propose that you use data.table as it makes things really easy:

require(data.table)
dt <- data.table(df)
dt.out <- dt[, list(s.v1=sum(v1), m.v2=mean(v2)), 
                  by=c("c1","c2")]
> dt.out

#    c1 c2 s.v1 m.v2
# 1:  A  A   11 24.5
# 2:  A  B    4 22.5
# 3:  B  A    9 39.0
# 4:  B  B   12 62.5

这篇关于聚合具有不同功能的多个变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆