使用Boot软件包按组进行Bootstrap [英] Bootstrap by groups with Boot package
问题描述
我有一个像这样的"my.dataset":
I have a "my.dataset" like this:
ID Species SEX Category V1 V2 V3
87790 Caniceps F F_Caniceps -0.34 -0.55 0.61
199486 Caniceps F F_Caniceps -0.34 -0.56 0.63
199490 Caniceps F F_Caniceps -0.37 -0.54 0.57
199493 Caniceps F F_Caniceps -0.35 -0.54 0.58
200139 Caniceps F F_Caniceps -0.39 -0.51 0.51
393151 Caniceps M M_Caniceps -0.36 -0.56 0.55
393154 Caniceps M M_Caniceps -0.36 -0.55 0.55
486210 Caniceps M M_Caniceps -0.41 -0.50 0.45
811945 Hyemalis F F_Hyemalis -0.35 -0.54 0.55
811947 Hyemalis F F_Hyemalis -0.35 -0.59 0.62
15661 Hyemalis M M_Hyemalis -0.34 -0.56 0.62
15662 Hyemalis M M_Hyemalis -0.35 -0.53 0.53
15663 Hyemalis M M_Hyemalis -0.33 -0.58 0.68
15664 Vulcani F F_Vulcani -0.29 -0.57 0.71
15665 Vulcani F F_Vulcani -0.29 -0.56 0.67
15666 Vulcani F F_Vulcani -0.28 -0.55 0.70
486218 Vulcani F F_Vulcani -0.36 -0.55 0.56
486224 Vulcani F F_Vulcani -0.36 -0.54 0.56
486212 Vulcani M M_Vulcani -0.37 -0.53 0.53
486213 Vulcani M M_Vulcani -0.37 -0.53 0.54
199479 Vulcani M M_Vulcani -0.33 -0.57 0.61
199483 Vulcani M M_Vulcani -0.33 -0.62 0.69
199484 Vulcani M M_Vulcani -0.33 -0.60 0.65
我正在尝试使用boot()
执行引导,以计算变量"V1","V2"和"V3"的统计信息,例如:
I'm trying to perform a bootstrap with boot()
to compute a statistic over variables "V1", "V2" and "V3", something like:
boot(my.dataset, statistic=lda (formula=lda(SEX~V1+V2+V3, data=my.dataset), R=3, sim = "ordinary")
但是我需要重新采样以根据"my.dataset"的"Category"变量获取相同数量的个体.关于如何执行此操作的任何想法?
But I need the resampling to take the same number of individuals depending on "Category" variable of "my.dataset". Any idea about how to do this?
推荐答案
您正在寻找引导程序的"strata"参数.这称为分层引导程序. 备注:我不确定您的启动代码是否正确,我会提出类似的建议:
You are looking for the "strata" argument of the bootstrap. This is called a stratified bootstrap. Remark: i'm not sure that your boot code is correct, i would suggest something like:
statfun = function(d, i) {lda(formula=SEX~V1+V2+V3, data=d[i, ])}
res <- boot(my.dataset, statfun, R=100, strata=factor(my.dataset$Species))
我不知道lda()函数返回什么,但是stat函数必须返回一个值或一个矢量,引导程序才能正常工作.
I don't know what the lda() function returns, but the statfunction must return a value or a vector for the bootstrap to work properly.
此方法可确保因子的每个水平均根据其观察数进行选择.在普通的引导程序中,情况并非如此,并且会导致错误,因为某些复制缺少某些级别并且无法计算线性模型.
This method ensures that every level of the factor gets choosen proportionnaly to its number of observations. In the normal bootstrap, this is not the case and causes errors since some levels are missing in some replications and the linear model cannot be computed.
注意:在strata参数中,您必须再次指定数据框的名称
这篇关于使用Boot软件包按组进行Bootstrap的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!