如何快速聚合和汇总数据? [英] How does one aggregate and summarize data quickly?

查看:386
本文介绍了如何快速聚合和汇总数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集,其标题如下:

  PID时间网站报表计数

我想要 Count Rep 对于每个 PID x Time x Site combo



获取计数 PID x Time x Site 组合的平均值。

当前函数如下:

  dummy<  -  function(data)
{
A <-aggregate(Count_PID + Time + Site + Rep,data = data,function(x){sum(na.omit(x))})
B<时间+网站,数据= A,平均值)
return(B)
}

这很痛苦(原始data.frame 510000 20)。有没有办法加速与plyr?

解决方案

你应该看看包的数据。 table ,以便对大型数据帧进行更快的聚合操作。对于你的问题,解决方案如下:

  library(data.table)
data_t = data.table data_tab)
ans = data_t [,list(A = sum(count),B = mean(count)),by ='PID,Time,Site']
/ pre>

I have a dataset whose headers look like so:

PID Time Site Rep Count

I want sum the Count by Rep for each PID x Time x Site combo

on the resulting data.frame, I want to get the mean value of Count for PID x Time x Site combo.

Current function is as follows:

dummy <- function (data)
{
A<-aggregate(Count~PID+Time+Site+Rep,data=data,function(x){sum(na.omit(x))})
B<-aggregate(Count~PID+Time+Site,data=A,mean)
return (B)
}

This is painfully slow (original data.frame is 510000 20). Is there a way to speed this up with plyr?

解决方案

You should look at the package data.table for faster aggregation operations on large data frames. For your problem, the solution would look like:

library(data.table)
data_t = data.table(data_tab)
ans = data_t[,list(A = sum(count), B = mean(count)), by = 'PID,Time,Site']

这篇关于如何快速聚合和汇总数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆