如何快速聚合和汇总数据? [英] How does one aggregate and summarize data quickly?
问题描述
我有一个数据集,其标题如下:
PID时间网站报表计数
我想要 Count
由 Rep
对于每个 PID x Time x Site combo
获取计数
的 PID x Time x Site
组合的平均值。
当前函数如下:
dummy< - function(data)
{
A <-aggregate(Count_PID + Time + Site + Rep,data = data,function(x){sum(na.omit(x))})
B<时间+网站,数据= A,平均值)
return(B)
}
这很痛苦(原始data.frame 510000 20)
。有没有办法加速与plyr?
你应该看看包的数据。 table
,以便对大型数据帧进行更快的聚合操作。对于你的问题,解决方案如下:
library(data.table)
/ pre>
data_t = data.table data_tab)
ans = data_t [,list(A = sum(count),B = mean(count)),by ='PID,Time,Site']
I have a dataset whose headers look like so:
PID Time Site Rep Count
I want sum the
Count
byRep
for eachPID x Time x Site combo
on the resulting data.frame, I want to get the mean value of
Count
forPID x Time x Site
combo.Current function is as follows:
dummy <- function (data) { A<-aggregate(Count~PID+Time+Site+Rep,data=data,function(x){sum(na.omit(x))}) B<-aggregate(Count~PID+Time+Site,data=A,mean) return (B) }
This is painfully slow (original data.frame is
510000 20)
. Is there a way to speed this up with plyr?解决方案You should look at the package
data.table
for faster aggregation operations on large data frames. For your problem, the solution would look like:library(data.table) data_t = data.table(data_tab) ans = data_t[,list(A = sum(count), B = mean(count)), by = 'PID,Time,Site']
这篇关于如何快速聚合和汇总数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!