R自举加权与数据表的分组均值 [英] R bootstrap weighted mean by group with data table
问题描述
我正在尝试结合两种方法:
I am trying to combine two approaches:
与
以下是一些随机数据:
## Generate sample data
# Function to randomly generate weights
set.seed(7)
rtnorm <- function(n, mean, sd, a = -Inf, b = Inf){
qnorm(runif(n, pnorm(a, mean, sd), pnorm(b, mean, sd)), mean, sd)
}
# Generate variables
nps <- round(runif(3500, min=-1, max=1), 0) # nps value which takes 1, 0 or -1
group <- sample(letters[1:11], 3500, TRUE) # groups
weight <- rtnorm(n=3500, mean=1, sd=1, a=0.04, b=16) # weights between 0.04 and 16
# Build data frame
df = data.frame(group, nps, weight)
# The following packages / libraries are required:
require("data.table")
require("boot")
这是上面的第一篇文章的代码,它增强了加权均值:
This is the code from the first post above boostrapping the weighted mean:
samplewmean <- function(d, i, j) {
d <- d[i, ]
w <- j[i, ]
return(weighted.mean(d, w))
}
results_qsec <- boot(data= df[, 2, drop = FALSE],
statistic = samplewmean,
R=10000,
j = df[, 3 , drop = FALSE])
这完全正常.
下面的第二篇文章中的代码将数据表中的组按均值自举:
Below ist the code from the second post above bootstrapping the mean by groups within a data table:
dt = data.table(df)
stat <- function(x, i) {x[i, (m=mean(nps))]}
dt[, list(list(boot(.SD, stat, R = 100))), by = group]$V1
这也很好用.
我无法结合两种方法:
正在运行...
dt[, list(list(boot(.SD, samplewmean, R = 5000, j = dt[, 3 , drop = FALSE]))), by = group]$V1
…出现错误消息:
Error in weighted.mean.default(d, w) :
'x' and 'w' must have the same length
正在运行...
dt[, list(list(boot(dt[, 2 , drop = FALSE], samplewmean, R = 5000, j = dt[, 3 , drop = FALSE]))), by = group]$V1
…带来了另一个错误:
Error in weighted.mean.default(d, w) :
(list) object cannot be coerced to type 'double'
我仍然无法理解data.table中的参数以及如何合并运行data.table的函数.
I still have problems getting my head around the arguments in data.table and how to combine functions running data.table.
我将不胜感激.
推荐答案
它与data.table
在函数范围内的行为有关.即使用i
子集设置后,d仍然是samplewmean
内的data.table
,而weighted.mean
期望权重和值的数值向量.如果您在致电weighted.mean
之前先unlist
,则可以解决此错误
It is related to how data.table
behaves within the scope of a function. d is still a data.table
within samplewmean
even after subsetting with i
whereas weighted.mean
is expecting numerical vector of weights and of values. If you unlist
before calling weighted.mean
, you will be able to fix this error
weighted.mean.default(d,w)中的错误: (列表)对象不能强制输入"double"
Error in weighted.mean.default(d, w) : (list) object cannot be coerced to type 'double'
要传递到weighted.mean
之前要取消列出的代码:
Code to unlist before passing into weighted.mean
:
samplewmean <- function(d, i, j) {
d <- d[i, ]
w <- j[i, ]
return(weighted.mean(unlist(d), unlist(w)))
}
dt[, list(list(boot(dt[, 2 , drop = FALSE], samplewmean, R = 5000, j = dt[, 3 , drop = FALSE]))), by = group]$V1
更像data.table
的(data.table版本> = v1.10.2)语法可能如下:
A more data.table
-like (data.table version >= v1.10.2) syntax is probably as follows:
#a variable named original is being passed in from somewhere and i am unable to figure out from where
samplewmean <- function(d, valCol, wgtCol, original) {
weighted.mean(unlist(d[, ..valCol]), unlist(d[, ..wgtCol]))
}
dt[, list(list(boot(.SD, statistic=samplewmean, R=1, valCol="nps", wgtCol="weight"))), by=group]$V1
或另一种可能的语法是:(请参见 data.table常见问题解答1.6 )
Or another possible syntax is: (see data.table faq 1.6)
samplewmean <- function(d, valCol, wgtCol, original) {
weighted.mean(unlist(d[, eval(substitute(valCol))]), unlist(d[, eval(substitute(wgtCol))]))
}
dt[, list(list(boot(.SD, statistic=samplewmean, R=1, valCol=nps, wgtCol=weight))), by=group]$V1
这篇关于R自举加权与数据表的分组均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!