与常规矩阵类相比,Matrix包中的提取速度非常慢 [英] Extraction speed in Matrix package is very slow compared to regular matrix class

查看:103
本文介绍了与常规矩阵类相比,Matrix包中的提取速度非常慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是使用矩阵包与常规的R 基本矩阵类.

This is an example of comparing row extraction from large matrices, sparse and dense, using the Matrix package versus the regular R base-matrix class.

对于密集型矩阵,基类matrix的速度几乎快395倍:

For dense matrices the speed is almost 395 times faster for the base class matrix:

library(Matrix)
library(microbenchmark)

## row extraction in dense matrices
D1<-matrix(rnorm(2000^2), 2000, 2000)
D2<-Matrix(D1)
> microbenchmark(D1[1,], D2[1,])
Unit: microseconds
    expr      min        lq       mean    median       uq      max neval
 D1[1, ]   14.437   15.9205   31.72903   31.4835   46.907   75.101   100
 D2[1, ] 5730.730 5744.0130 5905.11338 5777.3570 5851.083 7447.118   100

对于稀疏矩阵,再次支持matrix几乎是63倍.

For sparse matrices it is almost 63 times in favor of matrix again.

## row extraction in sparse matrices
S1<-matrix(1*(runif(2000^2)<0.1), 2000, 2000)
S2<-Matrix(S1, sparse = TRUE)
microbenchmark(S1[1,], S2[1,])
Unit: microseconds
    expr      min       lq       mean    median        uq      max neval
 S1[1, ]   15.225   16.417   28.15698   17.7655   42.9905   45.692   100
 S2[1, ] 1652.362 1670.507 1771.51695 1774.1180 1787.0410 5241.863   100

为什么速度差异很大,并且有一种方法可以在

Why the speed discrepancy, and is there a way to speed up extraction in Matrix package?

推荐答案

我不知道到底是什么问题,可能是S4分派(可能像这样的小问题很大).通过(1)切换到行主要格式和(2)编写自己的专用访问器,我能够获得与matrix相当的性能(这非常容易,索引+访问连续的内存块)功能.我不知道您到底想做什么,还是不值得麻烦...

I don't know exactly what the trouble is, possibly S4 dispatch (which could potentially be a big piece of a small call like this). I was able to get performance equivalent to matrix (which has a pretty easy job, indexing + accessing a contiguous chunk of memory) by (1) switching to a row-major format and (2) writing my own special-purpose accessor function. I don't know exactly what you want to do or if it will be worth the trouble ...

设置示例:

set.seed(101)
S1 <- matrix(1*(runif(2000^2)<0.1), 2000, 2000)

转换为以列为主(dgCMatrix)和以行为主(dgRMatrix)的形式:

Convert to column-major (dgCMatrix) and row-major (dgRMatrix) forms:

library(Matrix)
S2C <- Matrix(S1, sparse = TRUE)
S2R <- as(S1,"dgRMatrix")

自定义访问者:

my_row_extract <- function(m,i=1) {
    r <- numeric(ncol(m))   ## set up zero vector for results
    ## suggested by @OttToomet, handles empty rows
    inds <- seq(from=m@p[i]+1, 
                to=m@p[i+1], length.out=max(0, m@p[i+1] - m@p[i]))
    r[m@j[inds]+1] <- m@x[inds]     ## set values
    return(r)
}

检查所有方法(所有TRUE)的结果是否相等:

Check equality of results across methods (all TRUE):

all.equal(S2C[1,],S1[1,])
all.equal(S2C[1,],S2R[1,])
all.equal(my_row_extract(S2R,1),S2R[1,])
all.equal(my_row_extract(S2R,17),S2R[17,])

基准:

benchmark(S1[1,], S2C[1,], S2R[1,], my_row_extract(S2R,1),
          columns=c("test","elapsed","relative"))
##                     test elapsed relative
## 4 my_row_extract(S2R, 1)   0.015    1.154
## 1                S1[1, ]   0.013    1.000
## 2               S2C[1, ]   0.563   43.308
## 3               S2R[1, ]   4.113  316.385

专用提取器可与基本基质竞争. S2R是超慢的,即使用于行提取(令人惊讶的是);但是,?"dgRMatrix-class"确实会说

The special-purpose extractor is competitive with base matrices. S2R is super-slow, even for row extraction (surprisingly); however, ?"dgRMatrix-class" does say

注意:面向列的稀疏类(例如dgCMatrix)是首选,并且在Matrix程序包中得到更好的支持.

Note: The column-oriented sparse classes, e.g., ‘dgCMatrix’, are preferred and better supported in the ‘Matrix’ package.

这篇关于与常规矩阵类相比,Matrix包中的提取速度非常慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆