R:是否可以对这个双循环进行矢量化/加速? [英] R: Is it possible to vectorise / speed-up this double loop?

查看:25
本文介绍了R:是否可以对这个双循环进行矢量化/加速?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个高级别的一般性问题.有一些类似的例子,但有不同的、更简洁的例子.或许无法回答.conn 是一个矩阵.

This is a high-level, general question. There are some similar ones around with different, and more concise, examples. Perhaps it cannot be answered. conn is a matrix.

     for (i in 2:dim(conn)[1]) {
        for (j in 2:dim(conn)[1]) {
          if ((conn[i, 1] == conn[1, j]) & conn[i, 1] != 0) {
              conn[i, j] <- 1
              conn[j, i] <- 1
              }
              else {
                conn[i, j] <- 0
                conn[j, i] <- 0
                }
           }
      }

这直接来自 clusterCons 包中的 cluscomp.

This comes straight out of cluscomp from the clusterCons package.

我的问题很简单:是否可以加速循环或对其进行矢量化?作为 R 初学者,我看不到它,也不想最终感到沮丧,因为这可能是不可能的.我会接受任何可以说是或否的答案,并暗示可能需要付出的努力.

My question is simply: is it possible to speed up the loop or to vectorise it? As an R beginner, I cannot see it and don't want to end up with frustration because it may not be possible. I'll accept any answer that can say yes or no and hint towards the potential amount of effort involved.

推荐答案

非矩阵解决方案 - 应该非常快,假设 conn 是非负且对称的...

Non-matrixy solution - should be pretty damn fast, assuming conn is non-negative and symmetric...

connmake = function(conn){
  ordering = order(conn[,1])
  breakpoints = which(diff(conn[ordering,1]) != 0)
  if (conn[ordering[1], 1] != 0){
    breakpoints = c(1, breakpoints + 1, nrow(conn) + 1)
  } else {
    breakpoints = c(breakpoints + 1, nrow(conn) +1)
  }
  output = matrix(0, nrow(conn), nrow(conn))

  for (i in 1:(length(breakpoints) - 1)){
    output[ ordering[breakpoints[i]:(breakpoints[i+1] -1)],
        ordering[breakpoints[i]:(breakpoints[i+1] -1)]] =  1
  }
  output[,1] = conn[,1]
  output[1,] = conn[,1]
  output
}

一些使用早期基准测试的测试代码.(原始代码实现为 orig()f2() 是较早的建议.)

Some test code using earlier benchmarking. (Original code is implemented as orig() , f2() is earlier suggestion.)

size = 2000
conn  = matrix(0, size, size)
conn[1,] = sample( 1:20, size, replace = T)
conn[,1] = conn[1,]

system.time(orig(conn) -> out1)
#user  system elapsed 
#20.54    0.00   20.54 
system.time(f2(conn) -> out2)
#user  system elapsed
#0.39    0.02    0.41 
system.time(connmake(conn) -> out3)
#user  system elapsed 
#0.02    0.00    0.01 
identical(out1, out2)
#[1] TRUE
identical(out1, out3)
#[1] TRUE

请注意,对于包含 0 的 conn,f2 实际上失败了,但不是我的问题,是吗?带有负值的 conn 可以简单地通过例如处理通过安全偏移增加相关值.非对称连接需要更多思考,但应该是可行的...

Note that f2 actually fails for conn containing 0, but not my problem, eh? conn with negative values can be dealt with simply by e.g. increasing the relevant values by a safe offset. Non-symmetric conn will require more thought, but should be doable...

一般的教训是,与成对比较相比,排序速度更快.成对比较是 O(N^2),而 R 中最慢的排序算法是 O(N^4/3).一旦数据被排序,比较就变得微不足道了.

The general lesson is that sort is fast compared to pairwise comparison. Pairwise comparison is O(N^2), while the slowest sort algorithm in R is O(N^4/3). Once the data is sorted, comparisons become trivial.

这篇关于R:是否可以对这个双循环进行矢量化/加速?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆