多个列的加权平均值,按组(在数据表中) [英] Weighted means for several columns, by groups (in a data.table)

查看:163
本文介绍了多个列的加权平均值,按组(在数据表中)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题是关于群组加权平均值的另一个问题:我想要使用 data.table 创建加权组内平均值。与初始问题的区别在于,在字符串向量中指定要平均的变量的名称。



数据:

  df < -  read.table(text =
region state county weights y1980 y1990 y2000
1 1 1 10 100 200 50
1 1 2 5 50 100 200
1 1 3 120 1000 500 250
1 1 4 2 25 100 400
1 1 4 15 125 150 200
2 2 1 1 10 50 150
2 2 2 10 10 10 200
2 2 2 40 40 100 30
2 2 3 20 100 100 10
,header = TRUE,na.strings = NA)

使用Roland的上述问题的建议回答:

  library(data.table)
dt< - as.data.table(df)
dt2 < - dt [,lapply ,weighted.mean,w = weights),by = list(region,state,county)]

我有一个向量与字符串来确定动态列,我想要组内加权平均值。

  colsToKeep = c y1980,y1990)

但我不知道如何将它作为参数传递给data.table magic。



我尝试了

 
as.list(colsToKeep),weighted.mean,w = weights),
by = list(region,state,county)]`
pre>

但我会得到:

  :二进制运算符的非数字参数

不知道如何实现我想要的。



奖金问题:我想保留原始列名称,而不是获取V1和V2。



解决方案

通常情况下,您应该能够:

  dt2 < -  dt [,lapply(.SD,weighted.mean,w = weights),
by = list state,county),.SDcols = colsToKeep]

c $ c> .SDcols 。但目前无法使用此功能 由于错误 weights 列将不可用,因为它未在 .SDcols 中指定。



直到修正为止,我们可以这样做:

  dt2 <-dt [,lapply(mget(colsToKeep),weighted.mean,w = weights),
by = list(region,state,county)]
#region state county y1980 y1990
#1:1 1 1 100.0000 200.0000
#2:1 1 2 50.0000 100.0000
#3:1 1 3 1000.0000 500.0000
#4:1 1 4 113.2353 144.1176
#5:2 2 1 10.0000 50.0000
#6:2 2 2 34.0000 82.0000
#7:2 2 3 100.0000 100.0000


This question follows another one on group weighted means: I would like to create weighted within-group averages using data.table. The difference with the initial question is that the names of the variables to be average are specified in a string vector.

The data:

df <- read.table(text= "
          region    state  county  weights y1980  y1990  y2000
             1        1       1       10     100    200     50
             1        1       2        5      50    100    200
             1        1       3      120    1000    500    250
             1        1       4        2      25    100    400
             1        1       4       15     125    150    200
             2        2       1        1      10     50    150
             2        2       2       10      10     10    200
             2        2       2       40      40    100     30
             2        2       3       20     100    100     10
", header=TRUE, na.strings=NA)

Using Roland's suggested answer from aforementioned question:

library(data.table)
dt <- as.data.table(df)
dt2 <- dt[,lapply(.SD,weighted.mean,w=weights),by=list(region,state,county)]

I have a vector with strings to determine dynamically columns for which I want the within-group weighted average.

colsToKeep = c("y1980","y1990")

But I do not know how to pass it as an argument for the data.table magic.

I tried

 dt[,lapply(
      as.list(colsToKeep),weighted.mean,w=weights),
      by=list(region,state,county)]` 

but I then get:

Error in x * w : non-numeric argument to binary operator

Not sure how to achieve what I want.

Bonus question: I'd like original columns names to be kept, instead of getting V1 and V2.

NB I use version 1.9.3 of the data.table package.

解决方案

Normally, you should be able to do:

dt2 <- dt[,lapply(.SD,weighted.mean,w=weights), 
          by = list(region,state,county), .SDcols = colsToKeep]

i.e., just by providing just those columns to .SDcols. But at the moment, this won't work due to a bug, in that weights column won't be available because it's not specified in .SDcols.

Until it's fixed, we can accomplish this as follows:

dt2 <- dt[, lapply(mget(colsToKeep), weighted.mean, w = weights), 
            by = list(region, state, county)]
#    region state county     y1980    y1990
# 1:      1     1      1  100.0000 200.0000
# 2:      1     1      2   50.0000 100.0000
# 3:      1     1      3 1000.0000 500.0000
# 4:      1     1      4  113.2353 144.1176
# 5:      2     2      1   10.0000  50.0000
# 6:      2     2      2   34.0000  82.0000
# 7:      2     2      3  100.0000 100.0000

这篇关于多个列的加权平均值,按组(在数据表中)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆