多个列的加权平均值，按组（在数据表中） [英] Weighted means for several columns, by groups (in a data.table)

查看：163 发布时间：2017/3/12 12:15:25 r data.table

本文介绍了多个列的加权平均值，按组（在数据表中）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这个问题是关于群组加权平均值的另一个问题：我想要使用 data.table 创建加权组内平均值。与初始问题的区别在于，在字符串向量中指定要平均的变量的名称。

数据：

  df < -  read.table（text =
 region state county weights y1980 y1990 y2000 
 1 1 1 10 100 200 50 
 1 1 2 5 50 100 200 
 1 1 3 120 1000 500 250 
 1 1 4 2 25 100 400 
 1 1 4 15 125 150 200 
 2 2 1 1 10 50 150 
 2 2 2 10 10 10 200 
 2 2 2 40 40 100 30 
 2 2 3 20 100 100 10 
，header = TRUE，na.strings = NA）

使用Roland的上述问题的建议回答：

  library（data.table）
 dt<  -  as.data.table（df）
 dt2 < -  dt [，lapply ，weighted.mean，w = weights），by = list（region，state，county）]

我有一个向量与字符串来确定动态列，我想要组内加权平均值。

  colsToKeep = c y1980，y1990）

但我不知道如何将它作为参数传递给data.table magic。

我尝试了

（ as.list（colsToKeep），weighted.mean，w = weights）， by = list（region，state，county）]` pre>

但我会得到：

  ：二进制运算符的非数字参数

不知道如何实现我想要的。

奖金问题：我想保留原始列名称，而不是获取V1和V2。

解决方案

通常情况下，您应该能够：

  dt2 < -  dt [，lapply（.SD，weighted.mean，w = weights），
 by = list state，county），.SDcols = colsToKeep]

c $ c> .SDcols 。但目前无法使用此功能 由于错误 ， weights 列将不可用，因为它未在 .SDcols 中指定。

直到修正为止，我们可以这样做：

  dt2 <-dt [，lapply（mget（colsToKeep），weighted.mean，w = weights），
 by = list（region，state，county）] 
＃region state county y1980 y1990 
＃1：1 1 1 100.0000 200.0000 
＃2：1 1 2 50.0000 100.0000 
＃3：1 1 3 1000.0000 500.0000 
＃4：1 1 4 113.2353 144.1176 
 ＃5：2 2 1 10.0000 50.0000 
＃6：2 2 2 34.0000 82.0000 
＃7：2 2 3 100.0000 100.0000

This question follows another one on group weighted means: I would like to create weighted within-group averages using data.table. The difference with the initial question is that the names of the variables to be average are specified in a string vector.

The data:

df <- read.table(text= "
          region    state  county  weights y1980  y1990  y2000
             1        1       1       10     100    200     50
             1        1       2        5      50    100    200
             1        1       3      120    1000    500    250
             1        1       4        2      25    100    400
             1        1       4       15     125    150    200
             2        2       1        1      10     50    150
             2        2       2       10      10     10    200
             2        2       2       40      40    100     30
             2        2       3       20     100    100     10
", header=TRUE, na.strings=NA)

Using Roland's suggested answer from aforementioned question:

library(data.table)
dt <- as.data.table(df)
dt2 <- dt[,lapply(.SD,weighted.mean,w=weights),by=list(region,state,county)]

I have a vector with strings to determine dynamically columns for which I want the within-group weighted average.

colsToKeep = c("y1980","y1990")

But I do not know how to pass it as an argument for the data.table magic.

I tried

 dt[,lapply(
      as.list(colsToKeep),weighted.mean,w=weights),
      by=list(region,state,county)]`

but I then get:

Error in x * w : non-numeric argument to binary operator

Not sure how to achieve what I want.

Bonus question: I'd like original columns names to be kept, instead of getting V1 and V2.

NB I use version 1.9.3 of the data.table package.

解决方案

Normally, you should be able to do:

dt2 <- dt[,lapply(.SD,weighted.mean,w=weights), 
          by = list(region,state,county), .SDcols = colsToKeep]

i.e., just by providing just those columns to .SDcols. But at the moment, this won't work due to a bug, in that weights column won't be available because it's not specified in .SDcols.

Until it's fixed, we can accomplish this as follows:

dt2 <- dt[, lapply(mget(colsToKeep), weighted.mean, w = weights), 
            by = list(region, state, county)]
#    region state county     y1980    y1990
# 1:      1     1      1  100.0000 200.0000
# 2:      1     1      2   50.0000 100.0000
# 3:      1     1      3 1000.0000 500.0000
# 4:      1     1      4  113.2353 144.1176
# 5:      2     2      1   10.0000  50.0000
# 6:      2     2      2   34.0000  82.0000
# 7:      2     2      3  100.0000 100.0000

这篇关于多个列的加权平均值，按组（在数据表中）的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

多个列的加权平均值，按组（在数据表中） [英] Weighted means for several columns, by groups (in a data.table)

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

多个列的加权平均值，按组（在数据表中） [英] Weighted means for several columns, by groups (in a data.table)

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭