R中按行自举/重采样矩阵 [英] bootstrapping/resampling matrix by row in R

查看:225
本文介绍了R中按行自举/重采样矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有20行10列的矩阵x.我需要一次采样(替换)5行并计算列均值.我需要重复此过程15次,并每次报告列均值.

I have a matrix x with 20 rows and 10 columns. I need to sample (with replacement) 5 rows at a time and calculate column means. I need to repeat this procedure by 15 times and report the column means for each time.

作为示例,我使用了R中的重采样库来执行此操作.

As an example, I used resample library in R to perform this.

# Create a random matrix
library("resample")

set.seed(1234)
x <- matrix( round(rnorm(200, 5)), ncol=10)

## Bootstrap 15 times by re sampling 5 rows at a time. 
k <- bootstrap(x,colMeans,B = 15,block.size=5)

我对上述过程的担心是,我不确定行是否保持完好无损",这意味着列均值是在所选的5行中计算的.第二个问题是上述函数中的block.size是否随机选择5行并计算colMeans并将其重复15次,并按如下所示重复进行报告?

My concern with above procedure is that I'm not sure if the rows are kept "in tact", meaning the column means are calculated within the 5 rows selected. The second question is whether, block.size in the above function randomly selects 5 rows randomly and calculates colMeans and repeats this for 15 times and is reported in replicates as shown below?

 k$replicates
      stat1 stat2 stat3 stat4 stat5 stat6 stat7 stat8 stat9 stat10
 [1,]  4.65  4.50  4.65  5.25  5.25  5.05  4.90  5.60  4.85   5.20
 [2,]  4.60  4.65  4.80  5.60  5.50  5.20  5.05  5.10  5.00   5.40
 [3,]  4.90  4.35  4.55  5.20  5.80  4.80  4.60  5.30  5.15   4.20
 [4,]  4.75  4.65  4.15  5.30  5.25  4.80  4.70  5.15  5.55   4.35
 [5,]  4.55  4.65  4.50  5.40  5.40  4.90  4.85  5.55  5.00   4.75
 [6,]  4.65  4.25  5.00  5.35  5.20  5.05  4.95  5.20  4.75   5.20
 [7,]  4.70  4.30  4.75  5.35  5.50  4.75  5.00  5.45  4.85   4.75
 [8,]  4.75  4.15  4.95  5.10  5.55  4.70  4.70  5.30  5.05   4.90
 [9,]  4.40  4.30  4.50  5.25  5.50  4.70  4.75  5.35  4.95   4.85
[10,]  4.85  4.50  4.35  5.25  5.70  4.75  4.65  5.35  4.95   4.10
[11,]  4.35  4.50  4.65  5.30  5.20  4.75  4.85  5.30  5.20   5.20
[12,]  4.25  4.55  5.20  5.00  5.45  4.80  4.90  5.15  5.30   5.00
[13,]  4.30  4.70  4.55  5.05  5.35  4.85  5.00  4.90  5.75   4.60
[14,]  4.70  4.35  4.95  5.25  5.40  4.85  4.90  5.20  5.40   5.20
[15,]  4.55  4.70  4.40  5.15  5.20  4.70  4.80  5.45  6.00   4.90

我并不受限于此功能或程序包,任何其​​他建议将不胜感激.

I'm not specifically restricted to this function or package, any other suggestion would be greatly appreciated.

非常感谢

推荐答案

不使用软件包,您可以这样做:

Without using a package, you could do it like this:

# your data
set.seed(1234)
x <- matrix( round(rnorm(200, 5)), ncol=10)

# reset seed for this sampling exercise; define sample size and # iterations    
set.seed(1)
samp_size <- 5
iter <- 15

# here are 15 blocks of 5 numbers, which will index rows of your matrix x
samp_mat <- matrix(sample(1:nrow(x), samp_size*iter, replace=T),
                   ncol=samp_size, byrow=T)

# example, look at the first 3 blocks:
samp_mat[1:3,]

#       [,1] [,2] [,3] [,4] [,5]
# [1,]    6    8   12   19    5
# [2,]   18   19   14   13    2
# [3,]    5    4   14    8   16

# so, you can get the colMeans for the first block like this 
# (i.e colMeans for rows 6  8 12 19  5, in this case)
colMeans(x[samp_mat[1,],])

# for all 15 blocks:
t(apply(samp_mat, 1, function(i) colMeans(x[i,])))

...如果您想将所有内容混为一谈,则可以:

...and if you want to smush it all into one statement, you can:

t(apply(matrix(sample(1:nrow(x), 5*15, replace=T), ncol=5, byrow=T), 1,
        function(i) colMeans(x[i, ])))

(但是这不太容易理解)

(but that's obv less readable)

这篇关于R中按行自举/重采样矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆