使用 data.table 聚合小计和总计 [英] Aggregating sub totals and grand totals with data.table
问题描述
我在 R 中有一个 data.table
:
I've got a data.table
in R:
library(data.table)
set.seed(1)
DT = data.table(
group=sample(letters[1:2],100,replace=TRUE),
year=sample(2010:2012,100,replace=TRUE),
v=runif(100))
将这些数据按组和年份汇总到汇总表中既简单又优雅:
Aggregating this data into a summary table by group and year is simple and elegant:
table <- DT[,mean(v),by='group, year']
但是,将这些数据汇总到一个汇总表中(包括小计和总计)有点困难,而且也不那么优雅:
However, aggregating this data into a summary table, including subtotals and grand totals, is a little more difficult, and a lot less elegant:
library(plyr)
yearTot <- DT[,list(mean(v),year='Total'),by='group']
groupTot <- DT[,list(mean(v),group='Total'),by='year']
Tot <- DT[,list(mean(v), year='Total', group='Total')]
table <- rbind.fill(table,yearTot,groupTot,Tot)
table$group[table$group==1] <- 'Total'
table$year[table$year==1] <- 'Total'
这会产生:
table[order(table$group, table$year), ]
有没有一种简单的方法可以用 data.table 指定小计和总计,例如 plyr 的 margins=TRUE
命令?我更喜欢在我的数据集上使用 data.table 而不是 plyr,因为它是一个非常大的数据集,我已经拥有 data.table 格式.
Is there a simple way to specify subtotals and grand totals with data.table, such as the margins=TRUE
command for plyr? I would prefer to use data.table over plyr on my dataset, as it is a very large dataset that I already have in the data.table format.
推荐答案
在最近开发的 data.table 中,您可以使用称为分组集"的新功能来生成小计:
In recent devel data.table you can use new feature called "grouping sets" to produce sub totals:
library(data.table)
set.seed(1)
DT = data.table(
group=sample(letters[1:2],100,replace=TRUE),
year=sample(2010:2012,100,replace=TRUE),
v=runif(100))
cube(DT, mean(v), by=c("group","year"))
# group year V1
# 1: a 2011 0.4176346
# 2: b 2010 0.5231845
# 3: b 2012 0.4306871
# 4: b 2011 0.4997119
# 5: a 2012 0.4227796
# 6: a 2010 0.2926945
# 7: NA 2011 0.4463616
# 8: NA 2010 0.4278093
# 9: NA 2012 0.4271160
#10: a NA 0.3901875
#11: b NA 0.4835788
#12: NA NA 0.4350153
cube(DT, mean(v), by=c("group","year"), id=TRUE)
# grouping group year V1
# 1: 0 a 2011 0.4176346
# 2: 0 b 2010 0.5231845
# 3: 0 b 2012 0.4306871
# 4: 0 b 2011 0.4997119
# 5: 0 a 2012 0.4227796
# 6: 0 a 2010 0.2926945
# 7: 2 NA 2011 0.4463616
# 8: 2 NA 2010 0.4278093
# 9: 2 NA 2012 0.4271160
#10: 1 a NA 0.3901875
#11: 1 b NA 0.4835788
#12: 3 NA NA 0.4350153
这篇关于使用 data.table 聚合小计和总计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!