R中多列的汇总和加权均值 [英] Aggregate and Weighted Mean for multiple columns in R

查看:203
本文介绍了R中多列的汇总和加权均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题基本上是这样的: R中的聚合和加权均值

The question is basically the samt as this: Aggregate and Weighted Mean in R.

但是我希望它使用data.table在几列上进行计算,因为我有数百万行。像这样:

But i want it to compute it on several columns, using data.table, as I have millions of rows. So something like this:

set.seed(42)   # fix seed so that you get the same results
dat <- data.frame(assetclass=sample(LETTERS[1:5], 20, replace=TRUE), 
                                    tax=rnorm(20),tax2=rnorm(20), assets=1e7+1e7*runif(20), assets2=1e6+1e7*runif(20))

DT <- data.table(dat)

我可以像这样计算一列资产的加权平均值:

I can compute the weighted mean on one column, assets, like this:

DT[,list(wret = weighted.mean(tax,assets)),by=assetclass]

但是如何同时在资产和资产2上执行呢?

如果有多个列,例如 col = c( assets1, assets2, assets3,。 ..)
而且也可以对税,税1 ...

But how to do it on both assets and assets2?
What if there are several columns, like col=c("assets1", "assets2", "assets3", ... )? And is it also possible to do it for tax, tax1...

推荐答案

几列权重

DT <- data.table(assetclass=sample(LETTERS[1:5], 20, replace=TRUE), 
                  tax=rnorm(20), assets=1e7+1e7*runif(20), asets2=1e6+1e7*runif(20))
DT[, lapply(.SD, FUN=weighted.mean, x=tax), by=assetclass, .SDcols=3:4]
#    assetclass      assets       asets2
# 1:          D -0.14179882 -0.003717957
# 2:          B  0.61146928  0.523913589
# 3:          E -0.28037796 -0.147677384
# 4:          C -0.09658125 -0.010338894
# 5:          A  0.74954460  0.750190947

,也可以从 .SD 中排除非权重列:

or you can exclude the non-weight columns from .SD:

DT[, lapply(.SD, FUN=weighted.mean, x=tax), by=assetclass, .SDcols=-(1:2)]

这里是使用矩阵乘法的变体:

Here is a variant using matrix multiplication:

DT[, as.list(crossprod(as.matrix(.SD), tax)/colSums(.SD)), by=assetclass, .SDcols=-(1:2)]

矩阵乘法还可以对几列 tax1 执行此操作tax2 ,...

The matrix multiplication can do it also for several columns tax1, tax2, ...

DT <- data.table(assetclass=sample(LETTERS[1:5], 20, replace=TRUE), 
                 tax1=rnorm(20), tax2=rnorm(20), assets=1e7+1e7*runif(20), asets2=1e6+1e7*runif(20))
DT[, as.list(crossprod(as.matrix(.SD), tax1)/colSums(.SD)), by=assetclass, .SDcols=-(1:2)]
DT[, as.list(crossprod(as.matrix(.SD), tax2)/colSums(.SD)), by=assetclass, .SDcols=-(1:2)]
DT[, as.list(crossprod(as.matrix(.SD), cbind(tax1, tax2))/colSums(.SD)), by=assetclass, .SDcols=-(1:2)]

这篇关于R中多列的汇总和加权均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆