R自举加权与数据表的分组均值 [英] R bootstrap weighted mean by group with data table

查看:126
本文介绍了R自举加权与数据表的分组均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试结合两种方法:

I am trying to combine two approaches:

  1. 在data.table中引导多个列

  1. Bootstrap加权均值R

以下是一些随机数据:

## Generate sample data

# Function to randomly generate weights
set.seed(7)
rtnorm <- function(n, mean, sd, a = -Inf, b = Inf){
qnorm(runif(n, pnorm(a, mean, sd), pnorm(b, mean, sd)), mean, sd)
}

# Generate variables
nps    <- round(runif(3500, min=-1, max=1), 0) # nps value which takes 1, 0 or -1
group  <- sample(letters[1:11], 3500, TRUE) # groups
weight <- rtnorm(n=3500, mean=1, sd=1, a=0.04, b=16) # weights between 0.04 and 16

# Build data frame
df = data.frame(group, nps, weight)

# The following packages / libraries are required:
require("data.table")
require("boot")

这是上面的第一篇文章的代码,它增强了加权均值:

This is the code from the first post above boostrapping the weighted mean:

samplewmean <- function(d, i, j) {
  d <- d[i, ]
  w <- j[i, ]
  return(weighted.mean(d, w))   
}

results_qsec <- boot(data= df[, 2, drop = FALSE], 
                     statistic = samplewmean, 
                     R=10000, 
                     j = df[, 3 , drop = FALSE])

这完全正常.

下面的第二篇文章中的代码将数据表中的组按均值自举:

Below ist the code from the second post above bootstrapping the mean by groups within a data table:

dt = data.table(df)
stat <- function(x, i) {x[i, (m=mean(nps))]}
dt[, list(list(boot(.SD, stat, R = 100))), by = group]$V1

这也很好用.

我无法结合两种方法:

正在运行...

dt[, list(list(boot(.SD, samplewmean, R = 5000, j = dt[, 3 , drop = FALSE]))), by = group]$V1

…出现错误消息:

Error in weighted.mean.default(d, w) : 
  'x' and 'w' must have the same length

正在运行...

dt[, list(list(boot(dt[, 2 , drop = FALSE], samplewmean, R = 5000, j = dt[, 3 , drop = FALSE]))), by = group]$V1

…带来了另一个错误:

Error in weighted.mean.default(d, w) : 
  (list) object cannot be coerced to type 'double'

我仍然无法理解data.table中的参数以及如何合并运行data.table的函数.

I still have problems getting my head around the arguments in data.table and how to combine functions running data.table.

我将不胜感激.

推荐答案

它与data.table在函数范围内的行为有关.即使用i子集设置后,d仍然是samplewmean内的data.table,而weighted.mean期望权重和值的数值向量.如果您在致电weighted.mean之前先unlist,则可以解决此错误

It is related to how data.table behaves within the scope of a function. d is still a data.table within samplewmean even after subsetting with i whereas weighted.mean is expecting numerical vector of weights and of values. If you unlist before calling weighted.mean, you will be able to fix this error

weighted.mean.default(d,w)中的错误: (列表)对象不能强制输入"double"

Error in weighted.mean.default(d, w) : (list) object cannot be coerced to type 'double'

要传递到weighted.mean之前要取消列出的代码:

Code to unlist before passing into weighted.mean:

samplewmean <- function(d, i, j) {
  d <- d[i, ]
  w <- j[i, ]
  return(weighted.mean(unlist(d), unlist(w)))   
}

dt[, list(list(boot(dt[, 2 , drop = FALSE], samplewmean, R = 5000, j = dt[, 3 , drop = FALSE]))), by = group]$V1

更像data.table的(data.table版本> = v1.10.2)语法可能如下:

A more data.table-like (data.table version >= v1.10.2) syntax is probably as follows:

#a variable named original is being passed in from somewhere and i am unable to figure out from where
samplewmean <- function(d, valCol, wgtCol, original) {
    weighted.mean(unlist(d[, ..valCol]), unlist(d[, ..wgtCol]))
}

dt[, list(list(boot(.SD, statistic=samplewmean, R=1, valCol="nps", wgtCol="weight"))), by=group]$V1

或另一种可能的语法是:(请参见 data.table常见问题解答1.6 )

Or another possible syntax is: (see data.table faq 1.6)

samplewmean <- function(d, valCol, wgtCol, original) {
    weighted.mean(unlist(d[, eval(substitute(valCol))]), unlist(d[, eval(substitute(wgtCol))]))
}

dt[, list(list(boot(.SD, statistic=samplewmean, R=1, valCol=nps, wgtCol=weight))), by=group]$V1

这篇关于R自举加权与数据表的分组均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆