使用data.table聚合子总计和总计 [英] Aggregating sub totals and grand totals with data.table

查看:169
本文介绍了使用data.table聚合子总计和总计的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在R中有一个 data.table

library(data.table)
set.seed(1)
DT = data.table(
  group=sample(letters[1:2],100,replace=TRUE), 
  year=sample(2010:2012,100,replace=TRUE),
  v=runif(100))

按组和年度将此数据汇总到摘要表中简单而优雅:

Aggregating this data into a summary table by group and year is simple and elegant:

table <- DT[,mean(v),by='group, year']

数据到汇总表,包括小计和总计,是一个更困难一点,并且很少优雅:

However, aggregating this data into a summary table, including subtotals and grand totals, is a little more difficult, and a lot less elegant:

library(plyr)
yearTot <- DT[,list(mean(v),year='Total'),by='group']
groupTot <- DT[,list(mean(v),group='Total'),by='year']
Tot <- DT[,list(mean(v), year='Total', group='Total')]
table <- rbind.fill(table,yearTot,groupTot,Tot)
table$group[table$group==1] <- 'Total'
table$year[table$year==1] <- 'Total'

这会产生:

table[order(table$group, table$year), ]


b $ b

有一个简单的方法用data.table指定小计和总计,例如plyr的 margins = TRUE 命令?我更喜欢使用data.table over plyr在我的数据集,因为它是一个非常大的数据集,我已经在data.table格式。

Is there a simple way to specify subtotals and grand totals with data.table, such as the margins=TRUE command for plyr? I would prefer to use data.table over plyr on my dataset, as it is a very large dataset that I already have in the data.table format.

推荐答案

我不知道一个简单的方法。这里是一个实现的第一刺。我不知道plyr中的 margin = TRUE 是什么?

I'm not aware of a simple way. Here's a first stab at an implementation. I don't know margins=TRUE in plyr, is this what that does?

crossby = function(DT, j, by) {
    j = substitute(j)
    ans = rbind(
        DT[,eval(j),by],
        DT[,list("Total",eval(j)),by=by[1]],
        cbind("Total",DT[,eval(j),by=by[2]]),
        list("Total","Total",DT[,eval(j)]),
        use.names=FALSE
        # 'use.names' argument added in data.table v1.8.0
    )
    setkeyv(ans,by)
    ans
}

crossby(DT, mean(v), c("group","year"))

      group  year        V1
 [1,]     a  2010 0.2926945
 [2,]     a  2011 0.4176346
 [3,]     a  2012 0.4227796
 [4,]     a Total 0.3901875
 [5,]     b  2010 0.5231845
 [6,]     b  2011 0.4997119
 [7,]     b  2012 0.4306871
 [8,]     b Total 0.4835788
 [9,] Total  2010 0.4278093
[10,] Total  2011 0.4463616
[11,] Total  2012 0.4271160
[12,] Total Total 0.4350153

这篇关于使用data.table聚合子总计和总计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆