以可伸缩的方式自举data.table中的多列 [英] Bootstrapping multiple columns in data.table in a scalable fashion R
问题描述
这是此一个问题的后续问题。在最初的问题中,OP希望对固定的两列 x1
和 x2
执行引导:
This is a follow up question to this one. In the original question the OP wanted to perform bootstrap on two columns x1
and x2
that are fixed:
set.seed(1000)
data <- as.data.table(list(x1 = runif(200), x2 = runif(200), group = runif(200)>0.5))
stat <- function(x, i) {x[i, c(m1 = mean(x1), m2 = mean(x2))]}
data[, list(list(boot(.SD, stat, R = 10))), by = group]$V1
但是,我认为可以通过将它们视为组来很好地扩展此问题,以处理任意数量的列。例如,让我们使用 iris
数据集。假设我要为每个物种的所有四个维度计算引导平均数。我可以使用melt翻转数据,然后使用种类
,变量
组合一次性获得均值-我认为这种方法可以很好地扩展。
However, I think this problem can be nicely extended to handle any number of columns by treating them as groups. For instance, lets use the iris
dataset. Say I want to calculate bootstrap mean for all four dimensions for each species. I can use melt to flip the data and then use the Species
, variable
combination to get the mean in one go - I think this approach will scale well.
data(iris)
iris = data.table(iris)
iris[,mean(Sepal.Length),by=Species]
iris[,ID:=.N,]
iris_deep = melt(iris
,id.vars = c("ID","Species")
,measure.vars = c("Sepal.Length","Sepal.Width","Petal.Length","Petal.Width"))
#define a mean bootstrap function
stat <- function(x, i) {x[i, m=mean(value),]}
iris_deep[, list(list(boot(.SD, stat, R = 100))), by = list(Species,variable)]$V1
这是我的尝试这样做。但是,引导部分似乎无法正常工作。由于R引发以下错误:
Here is my attempt at doing this. However the bootstrapping part does not seem to be working. As R throws the following error:
Error in mean(value) : object 'value' not found
有人可以请问一下吗?
推荐答案
我尝试了此操作(用大括号括起 m = mean(value)
),它似乎可以正常工作:
I tried this (with added braces enclosing m=mean(value)
) and it appears to work:
stat <- function(x, i) {x[i, (m=mean(value))]}
这篇关于以可伸缩的方式自举data.table中的多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!