样品替代 [英] Alternative for sample
问题描述
我有以下使用 sapply
的 sample
代码,该代码处理时间很长(因为执行了多次):
I have the following sample
code that uses sapply
which takes long to process (since executed many times):
samples = sapply(rowIndices, function(idx){
sample(vectorToDrawFrom, 1, TRUE, weights[idx, ])
})
问题在于,我必须根据 rowIndices
中的索引来提取矩阵中的权重.
The issue is that I have to draw from the weights which are in the matrix, dependent on the indices in rowIndices
.
有人想出更好的主意从矩阵的行中绘制吗?
Does somebody have a better idea in mind to draw from the rows of the matrix?
可复制的示例:
rowIndices = floor(runif(1000, 1, 100))
vectorToDrawFrom = runif(5000, 0.0, 2.0)
weights = matrix(runif(100 * 5000, 1, 10), nrow = 100, ncol = 5000)
timer = 0
for (i in 1:2500){
ptm = proc.time()
samples = sapply(rowIndices, function(idx){
sample(vectorToDrawFrom, 1, TRUE, weights[idx, ])
})
timer = timer + (proc.time() - ptm)[3]
}
print(timer) # too long!!
推荐答案
所以这是我加快代码速度的一种方法.需要注意的一件事:采样值不会与 rowIndices
匹配",尽管按照正确的顺序进行操作很简单.2)您只存储最后一次迭代,尽管可能只是因为这是一个最小的可重现示例...
So here is a way I would speed up your code. One thing to note: the sampled value will not "match" with rowIndices
though it would be trivial to get things in the right order. 2) You only store the last iteration, though maybe that is just because this a Minimal Reproducible example...
基本上,您只需为 rowIndices
的每个值调用一次 sample
,因为 rowIndices
的范围为1-99,即99次调用,而不是1000,可以极大地提高速度.
Basically you should only need to call sample
once per value of rowIndices
since rowIndices
ranges from 1-99, that's 99 calls instead of 1000, which provides a huge speed up.
我们可以在开始之前对行索引进行排序
We can just sort the row indices before we start
rowIndices <- sort(rowIndices) ##sort the row indices and then loop
for (i in 1:15){
samples = unlist(sapply(unique(rowIndices),
function(idx){
sample(vectorToDrawFrom, sum(rowIndices %in% idx),
TRUE, weights[idx, ])
}))
}
Unit: milliseconds
expr
min lq mean median uq max neval cld
newForLoop 263.5668 266.6329 292.8301 268.8920 275.3378 515.899 100 a
OriginalForLoop 698.2982 705.6911 792.2846 712.9985 887.9447 1263.779 100 b
编辑
维持原始向量顺序的方法是保存索引或原始的 rowIndices
向量.然后对行索引进行排序并继续.
Edit
The way to maintain the original vector ordering is to save the index or the orignal rowIndices
vector. Then sort the row indices and proceed.
set.seed(8675309)
weights = matrix(c(1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0),
nrow = 5, ncol = 3, byrow = T)
rowIndices = c(2,1,2,4)
vectorToDrawFrom = runif(3, 0.0, 2.0)
set.seed(8675309)
##This is the origal code
sample2 = sapply(rowIndices, function(idx){
sample(vectorToDrawFrom, 1, TRUE, weights[idx, ])
})
rowIndx <- order(rowIndices) #get ordering index
rowIndices <- sort(rowIndices)
set.seed(8675309)
samples = unlist(sapply(unique(rowIndices), function(idx){
sample(vectorToDrawFrom, sum(rowIndices %in% idx), TRUE, weights[idx, ])
}))
samples = samples[order(rowIndx)]
all(samples == sample2)
#[1] TRUE
这篇关于样品替代的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!