多个列的加权平均值,按组(在数据表中) [英] Weighted means for several columns, by groups (in a data.table)
问题描述
这个问题是关于群组加权平均值的另一个问题:我想要使用 data.table
创建加权组内平均值。与初始问题的区别在于,在字符串向量中指定要平均的变量的名称。
数据:
df < - read.table(text =
region state county weights y1980 y1990 y2000
1 1 1 10 100 200 50
1 1 2 5 50 100 200
1 1 3 120 1000 500 250
1 1 4 2 25 100 400
1 1 4 15 125 150 200
2 2 1 1 10 50 150
2 2 2 10 10 10 200
2 2 2 40 40 100 30
2 2 3 20 100 100 10
,header = TRUE,na.strings = NA)
使用Roland的上述问题的建议回答:
library(data.table)
dt< - as.data.table(df)
dt2 < - dt [,lapply ,weighted.mean,w = weights),by = list(region,state,county)]
我有一个向量与字符串来确定动态列,我想要组内加权平均值。
colsToKeep = c y1980,y1990)
但我不知道如何将它作为参数传递给data.table magic。
我尝试了
(
pre>
as.list(colsToKeep),weighted.mean,w = weights),
by = list(region,state,county)]`
但我会得到:
:二进制运算符的非数字参数
不知道如何实现我想要的。
奖金问题:我想保留原始列名称,而不是获取V1和V2。
解决方案通常情况下,您应该能够:
dt2 < - dt [,lapply(.SD,weighted.mean,w = weights),
by = list state,county),.SDcols = colsToKeep]
c $ c> .SDcols 。但目前无法使用此功能 由于错误 ,
weights
列将不可用,因为它未在.SDcols
中指定。
直到修正为止,我们可以这样做:
dt2 <-dt [,lapply(mget(colsToKeep),weighted.mean,w = weights),
by = list(region,state,county)]
#region state county y1980 y1990
#1:1 1 1 100.0000 200.0000
#2:1 1 2 50.0000 100.0000
#3:1 1 3 1000.0000 500.0000
#4:1 1 4 113.2353 144.1176
#5:2 2 1 10.0000 50.0000
#6:2 2 2 34.0000 82.0000
#7:2 2 3 100.0000 100.0000
This question follows another one on group weighted means: I would like to create weighted within-group averages using
data.table
. The difference with the initial question is that the names of the variables to be average are specified in a string vector.The data:
df <- read.table(text= " region state county weights y1980 y1990 y2000 1 1 1 10 100 200 50 1 1 2 5 50 100 200 1 1 3 120 1000 500 250 1 1 4 2 25 100 400 1 1 4 15 125 150 200 2 2 1 1 10 50 150 2 2 2 10 10 10 200 2 2 2 40 40 100 30 2 2 3 20 100 100 10 ", header=TRUE, na.strings=NA)
Using Roland's suggested answer from aforementioned question:
library(data.table) dt <- as.data.table(df) dt2 <- dt[,lapply(.SD,weighted.mean,w=weights),by=list(region,state,county)]
I have a vector with strings to determine dynamically columns for which I want the within-group weighted average.
colsToKeep = c("y1980","y1990")
But I do not know how to pass it as an argument for the data.table magic.
I tried
dt[,lapply( as.list(colsToKeep),weighted.mean,w=weights), by=list(region,state,county)]`
but I then get:
Error in x * w : non-numeric argument to binary operator
Not sure how to achieve what I want.
Bonus question: I'd like original columns names to be kept, instead of getting V1 and V2.
NB I use version 1.9.3 of the data.table package.
解决方案Normally, you should be able to do:
dt2 <- dt[,lapply(.SD,weighted.mean,w=weights), by = list(region,state,county), .SDcols = colsToKeep]
i.e., just by providing just those columns to
.SDcols
. But at the moment, this won't work due to a bug, in thatweights
column won't be available because it's not specified in.SDcols
.Until it's fixed, we can accomplish this as follows:
dt2 <- dt[, lapply(mget(colsToKeep), weighted.mean, w = weights), by = list(region, state, county)] # region state county y1980 y1990 # 1: 1 1 1 100.0000 200.0000 # 2: 1 1 2 50.0000 100.0000 # 3: 1 1 3 1000.0000 500.0000 # 4: 1 1 4 113.2353 144.1176 # 5: 2 2 1 10.0000 50.0000 # 6: 2 2 2 34.0000 82.0000 # 7: 2 2 3 100.0000 100.0000
这篇关于多个列的加权平均值,按组(在数据表中)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!