使用R中的data.table和一个表列之一中的权重来计算加权平均值 [英] Calculating a weighted mean using data.table in R with weights in one of the table columns
问题描述
我有一个data.table如下所示。我正在尝试计算数据子集的加权平均值。我用下面的MWE尝试了两种方法
I have a data.table shown below. I'm trying to calculate the weighted mean for subsets of the data. I've tried two approaches with the MWE below
set.seed(12345)
dt = data.table(a =c(10,20,25,10,10),b=rnorm(5),c=rnorm(5),d=rnorm(5),e=rnorm(5))
dt$key = sample(toupper(letters[1:3]),5,replace=T)
setkey(dt, key)
首先将.SD设置为子集,并使用一个lapply调用,该调用不起作用(并且实际上并没有预期如此)
First subsetting the .SD and using an lapply call, which doesnt work (and wasn't really expected to)
dt[,lapply(.SD,function(x) weighted.mean(x,.SD[1])),by=key]
第二次尝试定义要应用于.SD的函数,就像使用ddply一样。
Second trying to define a function to apply to the .SD as I would if I were using ddply.
wmn=function(x){
tmp = NULL
for(i in 2:ncol(x)){
tmp1 = weighted.mean(x[,i],x[,1])
tmp = c(tmp,tmp1)
}
return(tmp)
}
dt[,wmn,by=key]
是否对如何做到最好有任何想法?
Any thoughts on how best to do this?
谢谢
编辑
更改所选列的wmn公式时出错。
Change to error on wmn formula on columns selected.
第二次编辑
加权均值公式反转并添加set.seed
Weighted Mean formula reversed back and added set.seed
推荐答案
如果要采用加权平均值 b ... e 使用 a作为权重,我认为这可以解决问题:
If you want to take the weighted means of "b"..."e" using "a" as the weight, I think this does the trick:
dt[,lapply(.SD,weighted.mean,w=a),by=key,.SDcols=letters[1:5]]
这篇关于使用R中的data.table和一个表列之一中的权重来计算加权平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!