向量化 R for 循环的有效方法 [英] Efficient way to vectorize R for loops

查看:37
本文介绍了向量化 R for 循环的有效方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何将以下 R 代码向量化以减少计算时间?

How can the following R code be vectorized to reduce computing time?

q = matrix(0,n,p)
for(u in 1 : n){
    q1 <- matrix(0,p,1)
  for(iprime in 1 : n){
    for(i in 1 : n){
      if(cause[iprime]==1 & cause[i]>1 & (time[i]<time[u]) & (time[u] <= time[iprime])){
          q1 = q1 + (covs[i,] - S1byS0hat[iprime,])*G[iprime]/G[i]*expz[i]/S0hat[iprime]
      }
    }

  }
    q[u,] = q1/(m*m)
}

可以使用以下值作为示例:

Following values could be used as an example:

n = 2000
m = 500
p=3
G = runif(n)
time = runif(n,0.01,5)
cause = c(rep(0,600),rep(1,1000),rep(2,400))
covs = matrix(rnorm(n*p),n,p)
S1byS0hat = matrix(rnorm(n*p),n,p)
S0hat = rnorm(n)
expz = rnorm(n)

推荐答案

对您的解决方案进行基准测试:

Benchmarking your solution:

coeff <- 10
n = 20 * coeff
m = 500
p = 3
G = runif(n)
time = runif(n, 0.01, 5)
cause = c(rep(0, 6 * coeff), rep(1, 10 * coeff), rep(2, 4 * coeff))
covs = matrix(rnorm(n * p), n, p)
S1byS0hat = matrix(rnorm(n * p), n, p)
S0hat = rnorm(n)
expz = rnorm(n)

system.time({
  q = matrix(0,n,p)
  for(u in 1 : n){
    q1 <- matrix(0,p,1)
    for(iprime in 1 : n){
      for(i in 1 : n){
        if(cause[iprime]==1 & cause[i]>1 & (time[i]<time[u]) & (time[u] <= time[iprime])){
          q1 = q1 + (covs[i,] - S1byS0hat[iprime,])*G[iprime]/G[i]*expz[i]/S0hat[iprime]
        }
      }

    }
    q[u,] = q1/(m*m)
  }
})

在我的电脑上需要 9 秒(使用 coeff = 10 而不是 100,我们可以稍后为其他解决方案增加它).

It takes 9 sec on my computer (with coeff = 10 instead of 100, we can increase it later for other solutions).

第一个解决方案是预先计算一些东西:

One first solution would be to precompute some stuff:

q2 = matrix(0, n, p)
c1 <- G / S0hat
c2 <- expz / G
for (u in 1:n) {
  q1 <- rep(0, p)
  ind_iprime <- which(cause == 1 & time[u] <= time)
  ind_i <- which(cause > 1 & time < time[u])
  for (iprime in ind_iprime) {
    for (i in ind_i) {
      q1 = q1 + (covs[i, ] - S1byS0hat[iprime, ]) * c1[iprime] * c2[i]
    }
  }
  q2[u, ] = q1
}
q2 <- q2 / (m * m)

coeff = 10 需要 0.3 秒,coeff = 100 需要 6 分钟.

This takes 0.3 sec for coeff = 10 and 6 min for coeff = 100.

然后,您可以矢量化至少一个循环:

Then, you can vectorize at least one loop:

q3 <- matrix(0, n, p)
c1 <- G / S0hat
c2 <- expz / G
covs_c2 <- sweep(covs, 1, c2, '*')
S1byS0hat_c1 <- sweep(S1byS0hat, 1, c1, '*')
for (u in 1:n) {
  q1 <- rep(0, p)
  ind_iprime <- which(cause == 1 & time[u] <= time)
  ind_i <- which(cause > 1 & time < time[u])
  for (iprime in ind_iprime) {
    q1 <- q1 + colSums(covs_c2[ind_i, , drop = FALSE]) * c1[iprime] - 
      S1byS0hat_c1[iprime, ] * sum(c2[ind_i])
  }
  q3[u, ] <- q1
}
q3 <- q3 / (m * m)

这只需 15 秒.

如果您关心进一步的性能,一个好的策略可能是在 Rcpp 中重新编码,尤其是为了避免大量内存分配.

If you care about further performance, a good strategy might be to recode this in Rcpp, especially to avoid lots of memory allocations.

这篇关于向量化 R for 循环的有效方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆