使用Boot软件包按组进行Bootstrap [英] Bootstrap by groups with Boot package

查看:84
本文介绍了使用Boot软件包按组进行Bootstrap的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个像这样的"my.dataset":

I have a "my.dataset" like this:

   ID    Species  SEX     Category     V1      V2     V3
87790   Caniceps    F   F_Caniceps  -0.34   -0.55   0.61
199486  Caniceps    F   F_Caniceps  -0.34   -0.56   0.63
199490  Caniceps    F   F_Caniceps  -0.37   -0.54   0.57
199493  Caniceps    F   F_Caniceps  -0.35   -0.54   0.58
200139  Caniceps    F   F_Caniceps  -0.39   -0.51   0.51
393151  Caniceps    M   M_Caniceps  -0.36   -0.56   0.55
393154  Caniceps    M   M_Caniceps  -0.36   -0.55   0.55
486210  Caniceps    M   M_Caniceps  -0.41   -0.50   0.45
811945  Hyemalis    F   F_Hyemalis  -0.35   -0.54   0.55
811947  Hyemalis    F   F_Hyemalis  -0.35   -0.59   0.62
 15661  Hyemalis    M   M_Hyemalis  -0.34   -0.56   0.62
 15662  Hyemalis    M   M_Hyemalis  -0.35   -0.53   0.53
 15663  Hyemalis    M   M_Hyemalis  -0.33   -0.58   0.68
 15664  Vulcani     F   F_Vulcani   -0.29   -0.57   0.71
 15665  Vulcani     F   F_Vulcani   -0.29   -0.56   0.67
 15666  Vulcani     F   F_Vulcani   -0.28   -0.55   0.70
486218  Vulcani     F   F_Vulcani   -0.36   -0.55   0.56
486224  Vulcani     F   F_Vulcani   -0.36   -0.54   0.56
486212  Vulcani     M   M_Vulcani   -0.37   -0.53   0.53
486213  Vulcani     M   M_Vulcani   -0.37   -0.53   0.54
199479  Vulcani     M   M_Vulcani   -0.33   -0.57   0.61
199483  Vulcani     M   M_Vulcani   -0.33   -0.62   0.69
199484  Vulcani     M   M_Vulcani   -0.33   -0.60   0.65

我正在尝试使用boot()执行引导,以计算变量"V1","V2"和"V3"的统计信息,例如:

I'm trying to perform a bootstrap with boot() to compute a statistic over variables "V1", "V2" and "V3", something like:

boot(my.dataset, statistic=lda (formula=lda(SEX~V1+V2+V3, data=my.dataset), R=3, sim = "ordinary")

但是我需要重新采样以根据"my.dataset"的"Category"变量获取相同数量的个体.关于如何执行此操作的任何想法?

But I need the resampling to take the same number of individuals depending on "Category" variable of "my.dataset". Any idea about how to do this?

推荐答案

您正在寻找引导程序的"strata"参数.这称为分层引导程序. 备注:我不确定您的启动代码是否正确,我会提出类似的建议:

You are looking for the "strata" argument of the bootstrap. This is called a stratified bootstrap. Remark: i'm not sure that your boot code is correct, i would suggest something like:

   statfun = function(d, i) {lda(formula=SEX~V1+V2+V3, data=d[i, ])}
res <- boot(my.dataset, statfun, R=100, strata=factor(my.dataset$Species))

我不知道lda()函数返回什么,但是stat函数必须返回一个值或一个矢量,引导程序才能正常工作.

I don't know what the lda() function returns, but the statfunction must return a value or a vector for the bootstrap to work properly.

此方法可确保因子的每个水平均根据其观察数进行选择.在普通的引导程序中,情况并非如此,并且会导致错误,因为某些复制缺少某些级别并且无法计算线性模型.

This method ensures that every level of the factor gets choosen proportionnaly to its number of observations. In the normal bootstrap, this is not the case and causes errors since some levels are missing in some replications and the linear model cannot be computed.

注意:在strata参数中,您必须再次指定数据框的名称

这篇关于使用Boot软件包按组进行Bootstrap的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆