样品替代 [英] Alternative for sample

查看:32
本文介绍了样品替代的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下使用 sapply sample 代码,该代码处理时间很长(因为执行了多次):

I have the following sample code that uses sapply which takes long to process (since executed many times):

samples = sapply(rowIndices, function(idx){
  sample(vectorToDrawFrom, 1, TRUE, weights[idx, ])
})

问题在于,我必须根据 rowIndices 中的索引来提取矩阵中的权重.

The issue is that I have to draw from the weights which are in the matrix, dependent on the indices in rowIndices.

有人想出更好的主意从矩阵的行中绘制吗?

Does somebody have a better idea in mind to draw from the rows of the matrix?

可复制的示例:

rowIndices = floor(runif(1000, 1, 100))
vectorToDrawFrom = runif(5000, 0.0, 2.0)
weights = matrix(runif(100 * 5000, 1, 10), nrow = 100, ncol = 5000)

timer = 0
for (i in 1:2500){
  ptm = proc.time()
  samples = sapply(rowIndices, function(idx){
    sample(vectorToDrawFrom, 1, TRUE, weights[idx, ])
  })
  timer = timer + (proc.time() - ptm)[3]
}

print(timer) # too long!!

推荐答案

所以这是我加快代码速度的一种方法.需要注意的一件事:采样值不会与 rowIndices 匹配",尽管按照正确的顺序进行操作很简单.2)您只存储最后一次迭代,尽管可能只是因为这是一个最小的可重现示例...

So here is a way I would speed up your code. One thing to note: the sampled value will not "match" with rowIndices though it would be trivial to get things in the right order. 2) You only store the last iteration, though maybe that is just because this a Minimal Reproducible example...

基本上,您只需为 rowIndices 的每个值调用一次 sample ,因为 rowIndices 的范围为1-99,即99次调用,而不是1000,可以极大地提高速度.

Basically you should only need to call sample once per value of rowIndices since rowIndices ranges from 1-99, that's 99 calls instead of 1000, which provides a huge speed up.

我们可以在开始之前对行索引进行排序

We can just sort the row indices before we start

rowIndices <- sort(rowIndices) ##sort the row indices and then loop
for (i in 1:15){
    samples = unlist(sapply(unique(rowIndices), 
        function(idx){
            sample(vectorToDrawFrom, sum(rowIndices %in% idx), 
                TRUE, weights[idx, ])
    }))       
}

Unit: milliseconds

expr
                      min       lq     mean   median       uq      max neval cld
 newForLoop      263.5668 266.6329 292.8301 268.8920 275.3378  515.899   100  a 
 OriginalForLoop 698.2982 705.6911 792.2846 712.9985 887.9447 1263.779   100   b

编辑

维持原始向量顺序的方法是保存索引或原始的 rowIndices 向量.然后对行索引进行排序并继续.

Edit

The way to maintain the original vector ordering is to save the index or the orignal rowIndices vector. Then sort the row indices and proceed.

set.seed(8675309)
weights = matrix(c(1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0), 
                 nrow = 5, ncol = 3, byrow = T)

rowIndices = c(2,1,2,4)
vectorToDrawFrom = runif(3, 0.0, 2.0)

set.seed(8675309)
##This is the origal code
sample2 = sapply(rowIndices, function(idx){       
  sample(vectorToDrawFrom, 1, TRUE, weights[idx, ])
})

rowIndx <- order(rowIndices)   #get ordering index
rowIndices <- sort(rowIndices) 

set.seed(8675309)
samples = unlist(sapply(unique(rowIndices), function(idx){
  sample(vectorToDrawFrom, sum(rowIndices %in% idx), TRUE, weights[idx, ])
}))

samples = samples[order(rowIndx)]
all(samples == sample2)
#[1] TRUE

这篇关于样品替代的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆