以可伸缩的方式自举data.table中的多列 [英] Bootstrapping multiple columns in data.table in a scalable fashion R

查看:77
本文介绍了以可伸缩的方式自举data.table中的多列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个问题的后续问题。在最初的问题中,OP希望对固定的两列 x1 x2 执行引导:

This is a follow up question to this one. In the original question the OP wanted to perform bootstrap on two columns x1 and x2 that are fixed:

set.seed(1000)
data <- as.data.table(list(x1 = runif(200), x2 = runif(200), group = runif(200)>0.5))
stat <- function(x, i) {x[i, c(m1 = mean(x1), m2 = mean(x2))]}
data[, list(list(boot(.SD, stat, R = 10))), by = group]$V1

但是,我认为可以通过将它们视为组来很好地扩展此问题,以处理任意数量的列。例如,让我们使用 iris 数据集。假设我要为每个物种的所有四个维度计算引导平均数。我可以使用melt翻转数据,然后使用种类变量组合一次性获得均值-我认为这种方法可以很好地扩展。

However, I think this problem can be nicely extended to handle any number of columns by treating them as groups. For instance, lets use the iris dataset. Say I want to calculate bootstrap mean for all four dimensions for each species. I can use melt to flip the data and then use the Species, variable combination to get the mean in one go - I think this approach will scale well.

data(iris)
iris = data.table(iris)
iris[,mean(Sepal.Length),by=Species]
iris[,ID:=.N,]
iris_deep = melt(iris
                 ,id.vars = c("ID","Species")
                 ,measure.vars = c("Sepal.Length","Sepal.Width","Petal.Length","Petal.Width"))
#define a mean bootstrap function
stat <- function(x, i) {x[i, m=mean(value),]}
iris_deep[, list(list(boot(.SD, stat, R = 100))), by = list(Species,variable)]$V1

这是我的尝试这样做。但是,引导部分似乎无法正常工作。由于R引发以下错误:

Here is my attempt at doing this. However the bootstrapping part does not seem to be working. As R throws the following error:

Error in mean(value) : object 'value' not found

有人可以请问一下吗?

推荐答案

我尝试了此操作(用大括号括起 m = mean(value)),它似乎可以正常工作:

I tried this (with added braces enclosing m=mean(value)) and it appears to work:

stat <- function(x, i) {x[i, (m=mean(value))]}

这篇关于以可伸缩的方式自举data.table中的多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆