与常规矩阵类相比,Matrix包中的提取速度非常慢 [英] Extraction speed in Matrix package is very slow compared to regular matrix class
问题描述
This is an example of comparing row extraction from large matrices, sparse and dense, using the Matrix package versus the regular R
base-matrix class.
对于密集型矩阵,基类matrix
的速度几乎快395倍:
For dense matrices the speed is almost 395 times faster for the base class matrix
:
library(Matrix)
library(microbenchmark)
## row extraction in dense matrices
D1<-matrix(rnorm(2000^2), 2000, 2000)
D2<-Matrix(D1)
> microbenchmark(D1[1,], D2[1,])
Unit: microseconds
expr min lq mean median uq max neval
D1[1, ] 14.437 15.9205 31.72903 31.4835 46.907 75.101 100
D2[1, ] 5730.730 5744.0130 5905.11338 5777.3570 5851.083 7447.118 100
对于稀疏矩阵,再次支持matrix
几乎是63倍.
For sparse matrices it is almost 63 times in favor of matrix
again.
## row extraction in sparse matrices
S1<-matrix(1*(runif(2000^2)<0.1), 2000, 2000)
S2<-Matrix(S1, sparse = TRUE)
microbenchmark(S1[1,], S2[1,])
Unit: microseconds
expr min lq mean median uq max neval
S1[1, ] 15.225 16.417 28.15698 17.7655 42.9905 45.692 100
S2[1, ] 1652.362 1670.507 1771.51695 1774.1180 1787.0410 5241.863 100
Why the speed discrepancy, and is there a way to speed up extraction in Matrix package?
推荐答案
我不知道到底是什么问题,可能是S4分派(可能像这样的小问题很大).通过(1)切换到行主要格式和(2)编写自己的专用访问器,我能够获得与matrix
相当的性能(这非常容易,索引+访问连续的内存块)功能.我不知道您到底想做什么,还是不值得麻烦...
I don't know exactly what the trouble is, possibly S4 dispatch (which could potentially be a big piece of a small call like this). I was able to get performance equivalent to matrix
(which has a pretty easy job, indexing + accessing a contiguous chunk of memory) by (1) switching to a row-major format and (2) writing my own special-purpose accessor function. I don't know exactly what you want to do or if it will be worth the trouble ...
设置示例:
set.seed(101)
S1 <- matrix(1*(runif(2000^2)<0.1), 2000, 2000)
转换为以列为主(dgCMatrix
)和以行为主(dgRMatrix
)的形式:
Convert to column-major (dgCMatrix
) and row-major (dgRMatrix
) forms:
library(Matrix)
S2C <- Matrix(S1, sparse = TRUE)
S2R <- as(S1,"dgRMatrix")
自定义访问者:
my_row_extract <- function(m,i=1) {
r <- numeric(ncol(m)) ## set up zero vector for results
## suggested by @OttToomet, handles empty rows
inds <- seq(from=m@p[i]+1,
to=m@p[i+1], length.out=max(0, m@p[i+1] - m@p[i]))
r[m@j[inds]+1] <- m@x[inds] ## set values
return(r)
}
检查所有方法(所有TRUE
)的结果是否相等:
Check equality of results across methods (all TRUE
):
all.equal(S2C[1,],S1[1,])
all.equal(S2C[1,],S2R[1,])
all.equal(my_row_extract(S2R,1),S2R[1,])
all.equal(my_row_extract(S2R,17),S2R[17,])
基准:
benchmark(S1[1,], S2C[1,], S2R[1,], my_row_extract(S2R,1),
columns=c("test","elapsed","relative"))
## test elapsed relative
## 4 my_row_extract(S2R, 1) 0.015 1.154
## 1 S1[1, ] 0.013 1.000
## 2 S2C[1, ] 0.563 43.308
## 3 S2R[1, ] 4.113 316.385
专用提取器可与基本基质竞争. S2R
是超慢的,即使用于行提取(令人惊讶的是);但是,?"dgRMatrix-class"
确实会说
The special-purpose extractor is competitive with base matrices. S2R
is super-slow, even for row extraction (surprisingly); however, ?"dgRMatrix-class"
does say
注意:面向列的稀疏类(例如dgCMatrix)是首选,并且在Matrix程序包中得到更好的支持.
Note: The column-oriented sparse classes, e.g., ‘dgCMatrix’, are preferred and better supported in the ‘Matrix’ package.
这篇关于与常规矩阵类相比,Matrix包中的提取速度非常慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!