计算带矩阵的colCumsums的更快替代方法 [英] faster alternative to compute colCumsums of a band matrix

查看:199
本文介绍了计算带矩阵的colCumsums的更快替代方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是R和stats的新手.在我目前工作的域中,我需要以独特的方式计算累积列总和.

I am new to R and stats.In the domain I am currently working in, I am required to compute the cumulative column sums in a unique manner.

首先提供宽度为b和行数为n的方带矩阵,例如n = 8和b = 3

Initially a square band matrix of width b and number of rows n is provided.For example for n = 8 and b = 3

0 1 2 7 0 0 0 0
0 0 3 6 7 0 0 0
0 0 0 3 1 7 0 0
0 0 0 0 4 4 7 0
0 0 0 0 0 5 8 7
0 0 0 0 0 0 1 8
0 0 0 0 0 0 0 4
0 0 0 0 0 0 0 0   

然后对矩阵进行变换,以得到以对角线为列的n x b矩阵.对于给定的示例,

Then the matrix is to be transformed in such a way that a n x b matrix with diagonals as columns are obtained.Like for the given example,

1 2 7  
3 6 7 
3 1 7 
4 4 7 
5 8 7 
1 8 0
4 0 0
0 0 0

我当前正在使用以下功能来执行此操作.

I am currently using the following function to perform this operation.

     packedband <- function(x, n, b) {
      mat <- sapply(0:(b-1), function(i)
         diag(x[-(n:(n-i)), -(1:(1+i))])[1:n] )
      mat[is.na(mat)] <- 0
      return(mat)
      }

然后从matrixStats包中应用colCumsums函数以获得所需的输出矩阵.对于给定的示例,

And then apply the colCumsums function from matrixStats packageto obtain the desired output matrix.For the given example,

1    2     7
4    8    14
7    9    21
11   13   28
16   21   35
17   29   35
21   29   35
21   29   35

我正在寻找的是对这些运算的更快的计算,因为在给定的域中,列(或行)的数量可以> 10 ^ 5.自最终目标以来,可以省略计算打包带函数的步骤是获得累积列总和. 预先感谢.

What I am looking for is a faster computation of these operations since in the given domain,the number of columns(or rows) can be > 10^5.Probably the step of calculating packedband function can be removed since the end goal is to obtain cumulative column sum. Thanks in advance.

推荐答案

在处理了稀疏矩阵之后,我认为for循环在这里可以很好地工作.

After messing about with sparse matrices, I think a for loop may work well here.

尝试原始数据

d = as.matrix(read.table(text="0 1 2 7 0 0 0 0
0 0 3 6 7 0 0 0
0 0 0 3 1 7 0 0
0 0 0 0 4 4 7 0
0 0 0 0 0 5 8 7
0 0 0 0 0 0 1 8
0 0 0 0 0 0 0 4
0 0 0 0 0 0 0 0 "))

colnames(d) <- NULL

功能

packedband <- function(x, b=3) {
      n = nrow(d)
      mat <- sapply(0:(b-1), function(i)
                  diag(x[-(n:(n-i)), -(1:(1+i))])[1:n] )
      mat[is.na(mat)] <- 0
      matrixStats::colCumsums(mat)
   }

forloop <- function(d, b=3){
     n = nrow(d)
     m = matrix(0, n, b)
      for(i in 1:b) {
        ro = 1:(n-i)
        co = (1+i):n
        vec = `length<-`(d[cbind(ro, co)], n)
        vec[is.na(vec)] <- 0
        m[ , i] = cumsum(vec)
      }
     m
   }

# create initial sparse matrix just to omit time to convert
# as if its faster it may be worth storing your band matrices in sparse format
library(Matrix)
m <- as(d, "TsparseMatrix") 

spm <- function(m, b=3){
x = sparseMatrix(i = m@i+1,
                 j = m@j - m@i,
                 x = m@x,
                 dims = c(nrow(m),b))
matrixStats::colCumsums(as.matrix(x))
}

all.equal(forloop(d), packedband(d))
all.equal(spm(m), packedband(d))

尝试更大的数据

d = matrix(0, 5e3, 5e3)
d[(col(d) - row(d)) == 1] <- 1
d[(col(d) - row(d)) == 2] <- 1
d[ (col(d) - row(d)) == 3] <- 1

m <- as(d, "TsparseMatrix") 

all.equal(forloop(d), packedband(d))
all.equal(spm(m), packedband(d))

microbenchmark::microbenchmark(packedband(d), forloop(d), spm(m), times=50)
# Unit: microseconds
#           expr         min          lq        mean      median          uq         max neval cld
#  packedband(d) 1348240.520 1724714.293 1740634.707 1733305.192 1763377.869 1960353.263    50   b
#     forloop(d)     720.344     973.658    1054.461    1026.807    1174.731    1565.912    50  a 
#         spm(m)    2145.875    2437.321    2586.503    2480.133    2749.019    3766.051    50  a 

这篇关于计算带矩阵的colCumsums的更快替代方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆