将汇总值连接回原始数据框 [英] Joining aggregated values back to the original data frame
问题描述
我反复使用的一种设计模式是在数据帧上执行分组"或拆分,应用,合并(SAC)",然后将聚合的数据重新连接回原始数据.例如,当在具有许多州和县的数据框中计算每个县与州平均值的偏差时,此功能很有用.我的汇总计算很少只是一个简单的平均值,但这是一个很好的例子.我经常通过以下方式解决此问题:
One of the design patterns I use over and over is performing a "group by" or "split, apply, combine (SAC)" on a data frame and then joining the aggregated data back to the original data. This is useful, for example, when calculating each county's deviation from the state mean in a data frame with many states and counties. Rarely is my aggregate calculation only a simple mean, but it makes a good example. I often solve this problem the following way:
require(plyr)
set.seed(1)
## set up some data
group1 <- rep(1:3, 4)
group2 <- sample(c("A","B","C"), 12, rep=TRUE)
values <- rnorm(12)
df <- data.frame(group1, group2, values)
## got some data, so let's aggregate
group1Mean <- ddply( df, "group1", function(x)
data.frame( meanValue = mean(x$values) ) )
df <- merge( df, group1Mean )
df
哪个会产生如下的汇总数据:
Which produces nice aggregate data like the following:
> df
group1 group2 values meanValue
1 1 A 0.48743 -0.121033
2 1 A -0.04493 -0.121033
3 1 C -0.62124 -0.121033
4 1 C -0.30539 -0.121033
5 2 A 1.51178 0.004804
6 2 B 0.73832 0.004804
7 2 A -0.01619 0.004804
8 2 B -2.21470 0.004804
9 3 B 1.12493 0.758598
10 3 C 0.38984 0.758598
11 3 B 0.57578 0.758598
12 3 A 0.94384 0.758598
这行得通,但是有其他替代方法可以提高可读性,性能等吗?
This works, but are there alternative ways of doing this which improve on readability, performance, etc?
推荐答案
只需一行代码即可解决问题:
One line of code does the trick:
new <- ddply( df, "group1", transform, numcolwise(mean))
new
group1 group2 values meanValue
1 1 A 0.48742905 -0.121033381
2 1 A -0.04493361 -0.121033381
3 1 C -0.62124058 -0.121033381
4 1 C -0.30538839 -0.121033381
5 2 A 1.51178117 0.004803931
6 2 B 0.73832471 0.004803931
7 2 A -0.01619026 0.004803931
8 2 B -2.21469989 0.004803931
9 3 B 1.12493092 0.758597929
10 3 C 0.38984324 0.758597929
11 3 B 0.57578135 0.758597929
12 3 A 0.94383621 0.758597929
identical(df, new)
[1] TRUE
这篇关于将汇总值连接回原始数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!